💲 🙌🏿 🗑️ How 3D game rendering works: lighting and shadows 🍔 😧 🈴

The implementation of the vast majority of visual effects in modern games depends on the judicious use of lighting and shadows. Without them, games would be boring and lifeless. In the fourth part of the analysis of the rendering of 3D games, we will focus on what is happening in the 3D world along with vertex processing and texture mapping. We will again need a lot of mathematics, as well as a solid understanding of the basics of optics.

Part 1: vertex processing

Part 2: rasterization and ray tracing

Part 3: texturing and filtering textures

Recall the past

Earlier, we examined the key aspects of moving and processing objects in scenes, their conversion from three-dimensional space to a flat grid of pixels, as well as ways to apply textures to these objects. For many years, such operations have been an essential part of the rendering process, and we can see this by returning in 1993 and launching id Software's Doom.

By modern standards, the use of light and shadow in this game was very primitive: the light sources were not taken into account, each surface based on its vertices was given a general color value or the value of the ambient light . All the signs of shadows were created thanks to the cunning use of textures and the choice of the color of the environment.

There were no shadows, because they were not the task of programmers: the PC of that time was a 66 MHz processor (that is, 0.066 GHz!), A 40 MB hard drive and a 512-kilobyte graphics card with minimal 3D capabilities. Fast forward 23: in the famous reboot of the series, we see a completely different story.

A lot of technologies were used to render this frame , it boasts such stages as screen space ambient occlusion, pre-pass depth mapping, bokeh blur filters, tone correction operators, and so on. The calculation of lighting and shading of each surface is performed dynamically: they are constantly changing depending on the environmental conditions and the player’s actions.

Since any 3D rendering operation requires mathematics (a whole bunch of calculations!), We better start with what happens behind the scenes of any modern game.

Math lighting

To implement everything correctly, we need to accurately simulate the behavior of light when interacting with various surfaces. It is curious that for the first time this problem began to be solved in the 18th century by a man named Johann Heinrich Lambert.

In 1760, a Swiss scientist released a book called Photometria . In it, he outlined the fundamental rules of the behavior of light; the most remarkable of them was the following - the surface emits light (by reflection or as a light source) in such a way that the brightness of the emitted light varies depending on the cosine of the angle between the normal surface and the observer.

This simple rule laid the foundation for so-called diffuse lighting. This is a mathematical model used to calculate the color of a surface depending on its physical properties (for example, its color and degree of light reflection) and the location of the light source.

In 3D rendering, this requires a lot of information, which is easiest to imagine in the form of such a scheme:

We see a lot of arrows in the image, these are vectors , and the following vectors are required to calculate color:

3 vectors for vertex position, light source and camera looking at the scene
2 vectors for the directions of the light source and the camera from the point of view of the vertex
1 normal vector
1 half-vector (it is always in the middle between the direction vectors of the lighting and the camera)

They are calculated at the stage of processing the vertices of the rendering process, and the equation uniting them all (called the Lambert model) has the form:

That is, the color of the vertex under diffuse lighting is calculated by multiplying the color of the surface, the color of the light source and the scalar product of the normal vectors of the vertex and the direction of light with the attenuation and projection lighting coefficients. This operation is performed for each light source in the scene, hence the sum symbol at the beginning of the equation.

The vectors in this equation (and everything that we see below) are normalized (as shown by the icons above each vector). The normalized vector retains its original direction, and its length decreases to a unit value (i.e., equal to 1 unit of measurement).

The color values of the surface and light source are standard RGBA numbers (red, green, blue, and alpha transparency). They can be integer (for example, INT8 for each color channel), but almost always are floating point numbers (for example, FP32). The attenuation coefficient determines how the level of illumination decreases when moving away from the source, and is calculated by another equation:

The terms A _C , A _L and A _Q are different coefficients (constant, linear, quadratic) that describe how the distance affects the lighting level. All of them are set by programmers when creating a rendering engine. In each graphical API, this is implemented in its own way, but coefficients are introduced when encoding the type of light source.

Before we consider the last coefficient (floodlight), it is worth noting that in 3D rendering there are essentially three types of light sources: spot, directional and flood.

Point sources evenly emit light in all directions, and directional sources emit light in only one direction (from the point of view of mathematics, this is just a point source, remote at an infinite distance). Spotlights are complex directional sources as they emit light in the shape of a cone. The way light varies in the body of the cone determines the size of the inner and outer parts of the cone.

And yes, for the searchlight coefficient, there is another equation:

The value of the projector coefficient is either 1 (i.e., the source is not a projector), or 0 (if the vertex is outside the cone direction), or some calculated value between the two. The angles φ (phi) and θ (theta) specify the dimensions of the inner / outer part of the cone of the spotlight.

Two vectors: L _dcs and L _dir (inverse to camera direction and spotlight direction) are used to determine if the cone of the vertex is touching.

Now we should remember that all this is necessary to calculate the value of the diffuse lighting, and all these operations must be performed for eachthe lighting source in the scene, or at least for each source that the programmer wanted to consider. Many of these equations are executed by graphical APIs, but they can also be done manually if encoders need more control over the image.

However, in the real world, in fact, there are an infinite number of light sources: each surface reflects the lighting, so they all affect the overall lighting of the scene. Even at night there is background lighting, whether it is stars and planets or light scattered in the atmosphere.

To simulate this, another lighting value is calculated: ambient lighting.

This equation is simpler than for diffuse lighting because directions are not required. Here, a simple multiplication of various coefficients is performed:

C _SA - surface illumination color
C _GA - highlight color of the global 3D scene
C _LA - color of illumination of all light sources in the scene

It is worth noting that the attenuation and projector coefficients are again used, as well as the summation of all light sources.

So, we have background lighting and we took into account the diffuse lighting of light sources from various surfaces of the 3D world. But the Lambert model works only for materials that reflect lighting from its surface in all directions; objects made of glass or metal create another type of reflection called specular ; Naturally, there is an equation for him too!

The individual parts of this formula should already be familiar to you: we have two values of the mirror color (one for the surface - C _S , the other for light - C _LS ), as well as the usual attenuation and flood factors.

Since the specular reflection is very focused and directed, two vectors are used to determine the brightness of the specular illumination: the vertex normal and the semi-vector. The coefficient p is called the power of specular reflection , this is a number that determines the brightness of reflection depending on the properties of the surface material. As p increases, the mirror effect becomes brighter, but more focused and smaller.

The last element to be considered is the simplest because it is just a number. It is called emissive lighting, and is applied to objects that are a direct source of lighting, that is, to a flame, a flashlight, or the Sun.

This means that now we have one number and three sets of equations for calculating the color of the vertex of the surface, taking into account the background lighting (environment), as well as the interaction between different light sources and the properties of the surface material (diffuse and specular). Programmers can choose only one or combine all four by simply folding them.

Visually, the combination looks like this:

The equations considered by us are applied by graphic APIs (for example, Direct3D and OpenGL) using their standard functions, but for each type of lighting there are alternative algorithms. For example, diffuse lighting can be implemented using the Oren-Nayyar model , which is better suited for very rough surfaces than the Lambert model.

The mirror reflection equation can be replaced by models that take into account the fact that very smooth surfaces like glass or metal are still rough, but at a microscopic level. Such models, called micro facet algorithms , at the cost of mathematical complexity, provide more realistic images.

Whatever model is used, they are all greatly improved by increasing the frequency of their application to the 3D scene.

Vertex or pixel-by-pixel calculations

When we examined vertex processing and rasterization , we saw that the results of all the tricky lighting calculations performed for each vertex should be interpolated over the surface between the vertices. This is because the properties associated with the surface material are stored inside the vertices; when the 3D world is compressed into a 2D pixel grid, the pixels remain only where the vertex was.

The rest of the pixels need to transmit information about the color of the vertices so that the colors mix correctly on the surface. In 1971, Henri Gouraud , then a graduate student at the University of Utah, proposed a method now called Gouraud Shading .

His method was computationally fast and for many years became the de facto standard, but he also had problems. He could not correctly interpolate the mirror lighting, and if the object was composed of a small number of primitives, then mixing between the primitives seemed erroneous.

A solution to this problem was proposed in 1973 by Bui Tyong Fong, who also worked at the University of Utah. In his research article, Fong demonstrated a technique for interpolating the normals of vertices on rasterized surfaces. This meant that the scattered and specular reflection models would work correctly for each pixel, and we can clearly see this in David Eck's online computer graphics and WebGL tutorial .

The carbon spheres shown below are colored with the same lighting model, but for left-handed calculations are performed vertically, followed by Gouraud shading to interpolate them over the entire surface. For the sphere on the right, the calculations are done pixel by pixel, and the difference is obvious.

Still images do not convey all the improvements that were brought by shading over Phong , but you can independently run the online demo of Ek and watch the animation.

However, Fong did not stop there, and a couple of years later he published another research article in which he showed how separate calculations for ambient, diffuse, and specular reflection can be performed with one simple equation:

Here we have to seriously understand! The values indicated by the letter k are the reflection constants for the ambient, diffuse, and specular reflections. Each of them is a fraction of the corresponding type of reflected light from the magnitude of the incident light; C values we saw in the equations above (color values of the surface material for each type of lighting).

The vector R is the “perfect reflection” vector — the direction in which the reflected light would move if the surface were perfectly smooth; it is calculated using the surface normal and the incident light vector. Vector C is the camera direction vector; and R and C are normalized.

Finally, there is the last constant in the equation: the value of α determines the degree of surface gloss. The smoother the material (i.e. the more it resembles glass or metal), the higher the number.

This equation is usually called the Phong reflection model . At the time of his research, such a proposal was radical, because it required serious computing resources. A simplified version of the model was created by Jim Blinn , replacing the part of the formula from R and C to H and N (half-distance vector and surface normal). The value of R must be calculated for each light source and for each pixel in the frame, andH is enough to calculate once for each source and for the whole scene.

The Blinn-Fong reflection model is today the standard lighting system and is used by default in Direct3D, OpenGL, Vulkan, etc.

There are many other mathematical models, especially now that GPUs can process pixels in long and complex shaders; together, such formulas are called bidirectional reflectance / transmission distribution functions (BRDF / BTFD); they are the foundation for coloring every pixel on the monitor when we play modern 3D games.

However, so far we have considered only surfaces reflecting light: translucent materials transmit light, while the rays of light are refracted. And some surfaces. for example, water reflect and transmit light to varying degrees.

We take lighting to a new level

Let's take a look at the 2018 Ubisoft Assassin's Creed: Odyssey game, in which the player often goes sailing on water, both in shallow rivers and in the deep sea.

Painted wood, metal, ropes, fabric and water - all this reflects and refracts light using a bunch of calculations.

For the most realistic rendering of water while maintaining a sufficient speed of the game, Ubisoft programmers used a whole set of tricks. The surface of the water is illuminated by the familiar trio of ambient, diffused, and specular light, but they are complemented by interesting features.

The first of these is often used to generate the reflective properties of water - these are screen space reflections (SSR). This technique renders the scene, but the colors of the pixels depend on the depth of each pixel, i.e. from his distance to the camera. Depth is stored in the so-called depth buffer. Then the frame is rendered again with all the usual lighting and texturing, but the scene is saved as render texture , and not as a ready buffer that is transmitted to the monitor.

After that, ray marching is performed . To do this, rays are emitted from the camera and distances are set along the course of the beam. The code checks the depth of the beam relative to the pixels in the depth buffer. If they have the same value, the code checks the normal pixel to see if it is directed to the camera, and if so, the engine looks for the corresponding pixel from the render texture. Then a further set of instructions inverts the position of the pixel so that it is correctly reflected in the scene.

The SSR order used in EA's Frostbite engine.

In addition, light is scattered during movement within materials, and for materials like water or leather, another trick called sub-surface scattering (SSS) is used. We will not explain it in detail, but you can read how it is used to create such amazing results in the 2014 Nvidia presentation .

Nvidia's 2013 FaceWorks demo ( link )

Let's get back to Assassin's Creed water: the SSS implementation is hardly noticeable here, and due to speed considerations, it is not used as much. In previous games of the AC series, Ubisoft used fake SSS , but in the last game its use is more complicated, but still not as large as we saw in the Nvidia demo.

To change the lighting values on the water surface, additional procedures are performed that correctly simulate the effects of depth due to a change in transparency depending on the distance to the coast. And when the camera looks at the water near the coast, even more algorithms are used to take into account caustics and refraction.

The results are impressive:

Assassin's Creed: Odyssey - rendering water in all its glory.

We looked at water, but what about the movement of light in the air? Dust particles, moisture and other elements also lead to light scattering. As a result, the rays of light receive volume , and do not remain just a set of direct rays.

The topic of volumetric lighting can be extended to a dozen articles, so we will talk about how the game Rise of the Tomb Raider deals with it . In the video below, there is only one main source of lighting - the sun shining through the opening in the building.

To create a volume of light, the game engine takes the camera visibility pyramid (see below) and exponentially breaks it in depth into 64 parts. Then, each slice is rasterized into grids with a size of 160 x 94 elements, and all this data is saved in three-dimensional render texture of the FP32 format. Since textures are usually two-dimensional, the “pixels” of the pyramid’s volume are called voxels .

For a 4 x 4 x 4 voxel block, computational shaders determine which active light sources affect this volume, and then write this information to another three-dimensional render texture. Then, to estimate the total “density” of light inside the voxel block, a complex formula called the Hengy-Greenstein scattering function is used .

Then the engine performs several more shaders to refine the data, after which ray marching is performed along the slices of the pyramid with the accumulation of light density values. Eidos-Montréal claims that on Xbox One all of these operations take approximately 0.8 milliseconds!

Although this technique is not used in all games, players expect to see volumetric coverage in almost all popular 3D games released today, especially in first-person shooters and action-adventure games.

The volumetric lighting used in the Rise of the Tomb Raider sequel of 2018.

Initially, this lighting technique was called "divine rays", or, as they are called in scientific terms, "twilight rays . " One of the first games in which it was used was Crytek's first Crysis , released in 2007.

However, this was not true volumetric lighting - the process included the initial rendering of the scene in the form of a depth buffer, which was used as a mask - another buffer in which the pixel colors became darker the closer they were to the camera.

This mask buffer was sampled several times, and the shader took samples and mixed them by blurring together. The result of this operation was mixed with the finished scene:

The progress of graphics cards over the past 12 years has been tremendous. The most powerful GPUs at the time of the release of Crysis were the Nvidia GeForce 8800 Ultra . The fastest modern GPU - the GeForce RTX 2080 Ti has more than 30 times more computing power, 14 times more memory and 6 times more bandwidth.

With all this computing power, modern games can provide much greater graphic accuracy and overall speed, despite the increased complexity of rendering.

“Divine Rays” in Ubisoft's The Division 2

But in fact, this effect demonstrates that despite the importance of correct lighting for the visual accuracy, the absence of light is actually even more important .

Essence of the shadow

Let's start a new section of the article with the game Shadow of the Tomb Raider . In the image below, all graphics options related to shadows are disabled; on the right they are included. The difference is huge, right?

Since shadows form naturally in the real world, games in which they are implemented incorrectly will never look right. Our brains are used to using shadows as a visual support to create a feeling of relative depth, location and movement. But doing it in a 3D game is surprisingly difficult, or at least difficult to do it right.

Let's start with the duck. Here she is moving around the field, and the rays of the sun reach her and are correctly blocked.

One of the first ways to implement the shadow in the scene was to add a “spot” of shadow under the model. This is completely unrealistic, because the shape of the shadow does not match the shape of the object casting the shadow; however, this approach is quick and easy to create.

The first 3D games, for example, the first Tomb Raider of 1996, used this method because the hardware of that time, for example, Sega Saturn and Sony PlayStation, could not provide anything better. This method painted a simple set of primitives just above the surface on which the model moves, and then shaded them; drawing at the bottom of a simple texture was also used.

Another of the first methods was projecting shadows . In this case, the primitive emitting a shadow was projected onto a plane containing the floor. Part of the necessary mathematical calculations for this was created by Jim Blinn in the late 80s. By modern standards, this is a simple process, and it works best for simple static objects.

But thanks to optimization, shadow projection provided the creation of the first worthy examples of dynamic shadows, for example, in the 1999 game Kingpin: Life of Crime by Interplay. As we see in the image below, only animated characters (even rats!) Have shadows, but this is better than simple spots.

The most serious problems with this approach are: (a) the perfect opacity of the shadow, and (b) the projection method emits the shadow on one flat surface (for example, on the ground).

These problems can be solved by applying a share of transparency when coloring the projected primitive and performing several projections for each character, but the PC hardware capabilities of the late 90s could not cope with the additional rendering.

Modern technology for creating shadows

A more accurate way to implement shadows was proposed much earlier, already in 1977. While working at the University of Austin (Texas), Franklin Crowe wrote a research article in which he proposed several techniques using shadow volumes .

In general terms, they can be described as follows: the process determines which primitives are directed towards the light source; their ribs are stretched to a plane. While this is very similar to projecting shadows, the important difference is that the created volume of shadows is then used to check whether the pixel is inside / outside the volume. Thanks to this information, shadows can be emitted on all surfaces, not just the ground.

This technique was improved in 1991 by Tim Heidmann, who worked onSilicon Graphics . Mark Kilgard was engaged in its further development in 1999 , and the method that we will consider was created in 2000 by John Carmack of id Software (although Carmack's method was independently opened two years earlier by Bilodo and Songa from Creative Labs; this led to that Carmack was forced to change his code to avoid legal problems).

This approach requires multiple frame rendering (called multipass rendering - a very expensive process in the early 90s that is used everywhere today) and a concept called stencil buffer .

Unlike frame buffers and depths, it is not created by the 3D scene itself - this buffer is an array of values equal in all dimensions (i.e. resolution in x and y ) in the form of a raster. The values stored in it are used to tell the rendering engine what to do with each pixel in the frame buffer.

The simplest example of using this buffer is to use as a mask:

The method with the volume of shadows is performed approximately like this:

We render the scene to the frame buffer, but use only the ambient lighting (we also include all the emission values in it if the pixel contains a light source)
, , ( (back-face culling)). (, ) . (.. «») - .
, (front-face culling) -, .
, , -.

These stencil buffers and shadow volumes (commonly called stencil shadows) were used in the 2004 Doom 3 id Software game :

Notice that the surface the character is walking on is still visible through the shadow? This is the first advantage over shadow projection. In addition, this approach allows you to take into account the distance from the light source (as a result, weaker shadows are obtained) and cast shadows on any surface (including the character itself).

But this technique has serious drawbacks, the most noticeable of which is that the edges of the shadow are completely dependent on the number of primitives used to create the object casting the shadow. In addition, multipassing is associated with many read / write operations to local memory, which is why the use of stencil shadows is quite costly in terms of performance.

In addition, there is a limit on the number of shadow volumes, which can be checked using the stencil buffer, because all graphic APIs allocate a fairly small number of bits on it (usually only 8). However, due to the computational cost of stencil shadows, this problem usually does not arise.

There is another problem - the shadows themselves are far from realistic. Why? Because all light sources - lamps, open flames, lanterns and the Sun - are not single points in space, i.e. they emit light of some area. Even in the simplest case shown below, real shadows rarely have sharply defined edges.

The darkest region of shadows is called the full shadow (umbra); penumbra is always a lighter shadow, and the border between the two is often blurred (because there are usually many light sources). It is difficult to model this with stencil buffers and volumes, since the created shadows are stored in a wrong form so that they can be processed. Shadow mapping comes to the rescue !

The basic procedure was developed in 1978 by Lance Williams . It is pretty simple:

For each light source, we render the scene from the point of view of this source, creating a special texture of the depths (that is, without color, lighting, texturing, etc.). The resolution of this buffer does not have to be equal to the size of the finished frame, but the higher the better.
, ( x,y z) , .
: , .

Obviously, this is another multi-pass procedure, but the last step can be performed using pixel shaders so that the depth check and subsequent lighting calculations are combined in one pass. And since the whole process of creating shadows does not depend on the number of primitives used, it is much faster than using the stencil buffer and the volume of shadows.

Unfortunately, the basic technique described above generates all kinds of visual artifacts (for example, perspective aliasing , “shadow acne” , “peter panning”), most of which are related to the resolution and bit size of the depth texture. All GPUs and graphics APIs have limitations similar to textures, so a whole set of additional techniques has been created to solve these problems.

One of the benefits of using textures for depth information is that GPUs can sample and filter them very quickly and in many different ways. In 2005, Nvidia demonstrated a texture sampling method that could solve some of the visual problems caused by standard shadowing. In addition, he provided a certain degree of smoothness of the edges of the shadows; this technique is called percentage closer filtering .

Around the same time, Futuremark demonstrated the use of cascaded shadow maps (CSM) in 3DMark06 . This is a technique in which for each light source several depth textures are created with different resolutions. High-resolution textures are used near the source, and lower - at a distance from the source. The result is smoother shadow transitions in the scene without distortion.

This technique was improved by Donnelly and Loritzen in 2006 in their variance shadow mapping (VSM) procedure , as well as Intel in 2010 in its sample distribution algorithm (SDSM).

Using SDSM in Shadow of the Tomb Raider

To improve the picture, game developers often use a whole arsenal of shading techniques, but the main one remains shadow mapping. However, it can be applied only to a small number of active light sources, because if you try to model it for each surface reflecting or emitting light, the frame rate will catastrophically drop.

Fortunately, there is a convenient technique that works with any object. It gives the impression of a decrease in the brightness of the illumination reaching the object (due to the fact that he or other objects block the light a little). This feature is called ambient occlusion.and she has many versions. Some of them are specially designed by hardware manufacturers, for example, AMD created HDAO ( high definition ambient occlusion ), and Nvidia has HBAO + ( horizon based ambient occlusion ).

Whatever version is used, it is applied after the scene is fully rendered, therefore it is classified as a post-processing effect . In fact, for each pixel it is calculated how much we see it in the scene (more about this here and here ) by comparing the value of the pixel depth with the pixels surrounding it at the corresponding point in the depth buffer (which, again, is stored as a texture).

Sampling the depth buffer and then calculating the final pixel color plays an important role in ensuring the quality of ambient occlusion; as in the case of shadowing, all versions of ambient occlusion for their proper operation require the programmer to carefully configure and adjust the code depending on the situation.

Shadow of the Tomb Raider without AO (left) and with HBAO + (right)

However, when properly implemented, this visual effect leaves a deep impression. In the image above, pay attention to the person’s hands, pineapples and bananas, as well as the surrounding grass and vegetation. The HBAO + pixel color changes are pretty minor, but all objects now look better built into the environment (on the left it seems that a person is hanging above the ground).

If you select any of the last games discussed in this article, then the list of rendering techniques used in them when processing lighting and shadows will be the length of the article itself. And although not every new 3D game boasts all of these technologies, universal game engines like Unreal allow you to optionally enable them, and toolkits (for example, Nvidia companies) provide a code ready to be inserted into the game. This proves that they are not highly specialized ultramodern methods - which were once the property of the best programmers, now they are available to anyone.

We cannot complete this article on lighting and shadows without mentioning ray tracing. We have already talked about this process in this series of articles, but the current level of technology developmentrequires putting up with a low frame rate and serious cash outlay.

However, the technology is supported by next-generation consoles Microsoft and Sony, which means that over the next few years, its use will become another standard tool for developers around the world seeking to improve the visual quality of games. Just take a look at what Remedy managed to achieve in her latest Control game :

We have come a long way from fake shadows in textures and simple ambient lighting!

That's not all

In the article, we tried to talk about the fundamental mathematical calculations and techniques used in 3D games, which make them as realistic as possible. We also examined the technologies underlying the modeling of the interaction of light with objects and materials. But all this was just the tip of the iceberg.

For example, we skipped topics such as energy-saving lighting, lens flare, bloom, highly dynamic rendering, radiance transfer, tone correction, fog, chromatic aberration, photon mapping, caustics, radiosity - this list goes on. A brief study would require 3-4 more articles.

How 3D game rendering works: lighting and shadows