👩🏼‍🤝‍👩🏻 👨🏿‍🍳 🕎 How 3D game rendering works: texturing and texture filtering ☮️ 🚵🏾 👆🏼

In the third article on rendering in 3D games, we will find out what happens to the 3D world after processing vertex processing and rasterizing the scene. Texturing is one of the most important stages of rendering, despite the fact that it only calculates and changes the colors of a two-dimensional grid of multi-colored blocks.

Most of the visual effects in modern games boil down to deliberate use of textures - without them, games would seem boring and lifeless. So let's see how it all works!

Part 1: vertex processing

Part 2: rasterization and ray tracing

Let's start with a simple

You can take any three-dimensional best-selling games released over the past year, and say with confidence that they all have something in common: they use texture maps (or just textures ). This is such a common term that when thinking about textures, most people present the same picture: a simple flat square or rectangle containing an image of a surface (grass, stone, metal, fabric, face, etc.).

But when used and combined using complex calculations, such simple images in a 3D scene can create amazingly realistic images. To understand how this is possible, let's turn them off completely and see how the objects of the 3D world without textures will look.

As we saw from previous articles, the 3D world is made up of vertices — simple shapes that move and then colorize. They are then used to create primitives, which in turn are compressed into a two-dimensional grid of pixels. Since we will not use textures, we need to colorize these pixels.

One of the methods that can be applied is called flat shading : the color of the first vertex of the primitive is taken, and then this color is applied to all the pixels covered by the figure in the raster. It looks something like this:

Obviously, the kettle looks unrealistic, and not least because of irregular surface colors. Colors jump from one level to another, there are no smooth transitions. One solution to the problem could be to use Gouraud shading .

In this process, the colors of the vertices are taken, after which the color change along the surface of the triangle is calculated. For this, linear interpolation is used . It sounds complicated, but in reality this means that if, for example, one side of the primitive has a color of 0.2 red and the other 0.8 red, then the middle of the figure will have a color in the middle between 0.2 and 0.8 (i.e. 0.5).

This process is simple enough, and this is its main advantage, because simplicity means speed. Many older 3D games used this technique because the computing equipment was limited in its capabilities.

Barrett and Cloud in all the grandeur of the Gouraud shading (Final Fantasy VII, 1997)

But even such a solution has problems - if the light falls right in the middle of the triangle, then its corners (and vertices) may not convey this property. This means that the glare created by light can be completely lost.

Although Gouraud's flat shading and shading have taken their rightful place in the rendering tools, the examples shown above are clear candidates for texture enhancement. And in order to understand well what happens when the texture is superimposed on the surface, we will go back in time ... already in 1996.

Game and GPU history in brief

About 23 years ago, id Software released Quake, and it became a major milestone. Although this was not the first game to use 3D polygons and textures to render environments, it was definitely one of the first to use them effectively.

But she did something else too - she showed what can be done using OpenGL (this graphics API was in the first version state then), and also helped a lot the first generation of graphics cards like Rendition Verite and 3Dfx Voodoo .

Lighting peaks and simple textures. Clean 1996, clean Quake.

By modern standards, Voodoo was extremely simple: no 2D graphics support, no vertex processing, only the simplest pixel processing. However, she was beautiful:

Image: VGA Museum

She had a whole chip (TMU) to get a pixel from a texture and another chip (FBI) for later mixing it with a raster pixel. The map could carry out a couple of additional processes, for example, the implementation of fog or transparency effects, but this, in essence, its capabilities ended.

If we look at the architecture underlying the structure and operation of the graphics card, we will see how these processes work.

3Dfx specification. Source: Falconfly Central

The FBI chip received two color values and mixed them; one of them could be a value from a texture. The mixing process is mathematically quite simple, but varies slightly depending on what is mixed and which API is used to execute the instructions.

If you look at what Direct3D offers us with regard to functions and mixing operations, we will see that each pixel is first multiplied by a number from 0.0 to 1.0. This determines how much the color of the pixel will affect the finished result. Then two changed pixel colors are added, subtracted or multiplied; in some functions, a logical operation is performed in which, for example, the brightest pixel is always selected.

Image: Taking Initiative tech blog

The image above shows how this works in practice; notice that the pixel alpha value is used as a coefficient for the left pixel. This number indicates the amount of transparency of the pixel.

At other stages, the fog value is applied (it is taken from the table created by the programmer, and then the same mixing calculations are performed); performing checks and changes in visibility and transparency; at the end, the color of the pixel is written to the memory of the graphics card.

Why do you need this excursion into history? Well, despite the relative simplicity of the design (especially compared to modern monsters), this process describes the fundamental principles of texturing: we take the values of the colors and mix them so that the models and environments look as they should in a particular situation.

Modern games do the same thing, the only difference is the number of textures used and the complexity of the mixing calculations. Together, they simulate the visual effects found in films, or the interaction of lighting with different materials and surfaces.

Texturing Basics

For us, a texture is a flat 2D image superimposed on the polygons that make up the 3D structure in the frame. However, for a computer, this is just a small block of memory in the form of a 2D array. Each element of the array denotes the color value of one of the pixels in the texture image (commonly called texels - texture pixels).

Each vertex of the polygon has a set of two coordinates (usually denoted as u, v ), telling the computer which pixel of the texture is associated with it. The vertex itself has a set of three coordinates ( x, y, z ), and the process of linking texels to the vertices is called texture mapping .

To see how this happens, let's turn to the tool that we have already used several times in this series of articles - Real Time Rendering WebGL . For now, we also discard the z coordinate of the vertices and consider everything on a flat plane.

From left to right: the u, v coordinates of the texture, tied directly to the x, y coordinates of the corner vertices. In the second image, the y coordinates are increased at the top vertices , but since the texture is still attached to them, it stretches vertically. The texture has already been changed in the right image: the u values have increased, but as a result, the texture has been compressed and then repeated.

This happened because, despite the fact that, in fact, the texture has become higher due to the increased value of u , it should still fit in the primitive - in fact, the texture partially repeated. This is one way to implement the effect that is often found in 3D games: repeating textures. Examples of this effect can be seen in scenes with stony or grassy landscapes, as well as with brick walls.

Now let's change the scene so that there are more primitives, and again return the depth of the scene. The classic landscape view is shown below, but now the box texture is copied and repeated for all primitives.

The box texture in its original gif format has a size of 66 KB and a resolution of 256 x 256 pixels. The initial resolution of the part of the frame covered by the box textures is 1900 x 680, that is, from the point of view of the pixel “area”, such an area should display only 20 box textures.

But it’s obvious that we see much more than twenty boxes, and this means that the texture of the box in the distance should be much smaller than 256 x 256 pixels. Indeed , they underwent a process called “texture minification” (yes, such a word exists in English!). Now, let’s repeat, but this time bring the camera closer to one of the drawers.

Do not forget that the texture has a size of only 256 x 256 pixels, but here we see a texture that is larger than half the image with a width of 1900 pixels. This texture was subjected to a “texture magnification” operation .

These two texture processes constantly occur in 3D games, because when the camera moves around the scene, the models approach or move away, and all the textures applied to the primitives must scale together with the polygons. From the point of view of mathematics, this is a small problem, in fact, even the simplest integrated graphics chips can easily do such a job. However, reducing and enlarging textures are new challenges that need to be addressed in some way.

Mini-copies of textures appear on the scene

The first problem to solve for textures is the distance. If we return to the first image with a landscape of boxes, then the boxes located near the horizon in fact have a size of only a few pixels. Therefore, trying to compress an image of 256 x 256 pixels in such a tiny space is pointless for two reasons.

Firstly, the smaller texture takes up less memory of the graphics card, which is convenient, because you can try to fit it into a smaller cache. This means that it is less likely to be deleted from the cache, that is, repeated use of this texture will provide increased performance, because the data will be in close memory. For the second reason, we will return soon, because it is associated with the same problem that arises in textures close to the camera.

The standard solution to the problem of the need to compress large textures into small primitives is to use mip-textures (mipmaps) . These are downsized versions of the original texture; they can be generated by the engine itself (using the appropriate API commands) or pre-created by game designers. Each subsequent level of mip texture has a half size compared to the previous one.

That is, for the box texture, the dimensions will be: 256 x 256 → 128 x 128 → 64 x 64 → 32 x 32 → 16 x 16 → 8 x 8 → 4 x 4 → 2 x 2 → 1 x 1.

All mip textures are packaged together, so the texture has the same file name, but becomes larger in size. The texture is packed in such a way that the u, v coordinates not only determine which texel is superimposed on the pixel in the frame, but also with which mip texture. Then the programmers write a renderer, based on the value of the pixel depth of the frame, which determines which mip texture to use. For example, if the value is very high, then the pixel is far away, which means you can use a small mip texture.

Attentive readers could notice the lack of mip-textures - they have to pay for them by increasing the size of the textures. The original texture of the box was 256 x 256 pixels, but as you can see in the image above, the texture with mip textures now has a size of 384 x 256. Yes, it has a lot of empty space, but no matter how we pack smaller textures, in general the size of the texture on one side will increase by at least 50%.

But this is true only for previously created mip-textures; if the game engine is programmed to generate them correctly, the increase is no more than 33% of the original texture size. Therefore, due to a small increase in the amount of memory for storing mip-textures, we get a gain in performance and visual quality.

Below is a comparison of images with mip textures disabled / enabled:

On the left side of the image, the textures of the boxes were used “as is”, which led to the appearance of granularity and the so-called moire in the distance. On the right, the use of mip-textures allowed for smoother transitions, and on the horizon the texture of the box is blurred into a uniform color.

However, who wants blurry textures to spoil the backgrounds of their favorite game?

Bilinear, trilinear, anisotropic - all this is for me a Chinese letter

The process of selecting a pixel from a texture to overlay it on a pixel in a frame is called sampling textures , and in an ideal world there would be a texture that ideally matches the primitive for which it is designed, regardless of size, position, direction, and so on. In other words, sampling the texture would be a simple one-to-one texel pixel mapping.

But since this is not so, there are several factors to consider when sampling textures:

Has the texture been reduced or enlarged?
Is the texture a source or mip texture?
At what angle is the texture displayed?

Let's analyze them in order. The first factor is quite obvious: if the texture has been increased, then in the primitive there will be more texels covering the pixel in the primitive than required; when decreasing, the opposite is true - each texel should now cover several pixels. And that is a problem.

The second factor does not cause problems, because mip-textures are used to bypass the problem of sampling the textures of far-off primitives, so the only task is to display the textures at an angle. And yes, this is also a problem. Why? Because all textures are images generated for viewing “strictly in front”. Speaking in mathematical language, the normal surface texture matches the surface nominal on which the texture is currently displayed.

Therefore, if the texels are too few or too many, or they are located at an angle, then an additional process called “texture filtering” is required . If this process is not used, then we get this:

Here we replaced the texture of the box with a texture with the letter R, to more clearly show what mess the image turns into without filtering textures!

Graphic APIs such as Direct3D, OpenGL, and Vulkan provide the same set of filtering types, but use different names for them. In fact, they all boil down to the following:

Near Point Sampling
Linear texture filtering
Anisotropic texture filtering

In fact, sampling the nearest point sampling is not a filter, because with it only the nearest texel of the texture pixel required is sampled (for example, copied from memory), and then it is mixed with the original color of the pixel.

Here linear filtering comes to our aid. The required texel coordinates u, v are transferred to the sampling equipment, but instead of taking the texel closest to these coordinates, the sampler takes four texels. These are texels located above, below, to the left and to the right of the texel that is selected by sampling the nearest points.

These four texels are then mixed using a formula with weights. In Vulkan, for example, the formula looks like this:

T denotes the texel color, where f is the result of filtration, and 1-4 is the color of four sampled texels. Alpha and beta values are taken depending on how far the point with coordinates u, v is from the middle of the texture.

Fortunately for those involved with 3D graphics, this happens automatically in the graphics chip. In fact, this is exactly what the 3dfx Voodoo card TMU chip did: it sampled four texels and then mixed them together. In Direct3D, this process has a strange name for bilinear filtering.but since the days of Quake and the TMU chip, graphics cards have already learned how to perform bilinear filtering in just one clock cycle (of course, if the texture is already located in the nearest memory).

Linear filtering can be used together with mip-textures, and if you want to complicate the filtering, you can take four texels from the texture, and then four more from the next level of mip-texture, mixing them all. And what is it called in Direct3D? Trilinear filtering. Where did the “three” come from in this process ? So we don’t know ...

The last filtering method worth mentioning is anisotropic . In fact, it is an improvement in the process performed by bilinear or trilinear filtering. Initially, it calculatesthe degree of anisotropy of the primitive surface (and this is a surprisingly complex process ) - this value increases the change in the aspect ratio of the primitive due to its orientation:

The figure above shows the same square primitive with equal side lengths; but gradually turning, it turns into a rectangle, and its width changes more than its height. Therefore, the primitive on the right has a greater degree of anisotropy than on the left (and in the case of a square, the degree is zero).

Many modern 3D games allow you to turn on anisotropic filtering and then change its level (from 1x to 16x), but what does it really change? This parameter controls the maximum number of additional texel samples that are taken in each initial linear sample. Suppose an anisotropic bilinear filtering of 8x is turned on in the game. This means that instead of four texel values, it will get 32 values.

The difference when using anisotropic filtering is clearly noticeable:

Just go up to the image above and compare the sampling of the nearest points with a maximum 16x anisotropic trilinear filtering. Amazingly smooth!

But for this smooth beauty of textures you have to pay with performance: at maximum settings, anisotropic trilinear filtering will receive 128 samples from the texture for each pixel of the rendering. Even with the very best modern GPUs this cannot be achieved in one clock cycle.

If you take, for example, AMD Radeon RX 5700 XT, then each of the texturing blocks inside the processor can use up to 32 texel addresses in one clock cycle, and then in the next clock cycle load 32 texel values from memory (each of which has a size of 32 bits), and then mix four of them in one more tact. That is, to mix 128 texel samples into one requires at least 16 clock cycles.

GPU AMD RDNA Radeon RX 5700 with 7-nanometer process technology

If the clock speed of the 5700 XT is 1605 MHz, then sixteen cycles take only 10 nanoseconds . Performing these cycles for each pixel in a 4K frame using just one texture unit will take only 70 milliseconds. Great, it looks like performance is not a big deal!

Even in 1996, 3Dfx Voodoo and similar cards quickly coped with textures. At most they could give out 1 texel with bilinear filtering per cycle, and with a TMU chip frequency of 50 MHz, this meant that 50 million texels could be processed per second. A game running at 800 x 600 and 30 fps only requires 14 million texels with bilinear filtering per second.

However, this is true only under the assumption that all textures are in the nearest memory and only one texel corresponds to each pixel. Twenty years ago, the idea of the need to overlay several textures on a primitive was completely alien, but today it is a standard. Let's see why this all changes.

Add lighting

To understand why texturing has become so important, take a look at this scene from Quake:

This is a dark image, because darkness was the atmosphere of the game, but we see that the darkness is not the same everywhere - some fragments of walls and floors are lighter than others, which creates a feeling of lightness in these areas.

The primitives that make up the walls and the floor are superimposed with the same textures, but there is another texture called the “light map” mixed with the values of the texels before they are applied to the frame pixels. In Quake times, lighting maps were calculated in advance and created by the game engine. They were used to generate static and dynamic lighting levels.

The advantage of their use is that complex lighting calculations were performed with textures rather than vertices, which greatly improved the appearance of the scene at the expense of low speed costs. Obviously, the image is imperfect: on the floor it is noticeable that the border between the illuminated areas and shadows is very sharp.

In many ways, a lightmap is just another texture (don't forget that they are all regular 2D datasets), so this scene is one of the first examples of using multitexturing. As the name implies, this is a process in which two or more textures are superimposed on a primitive. The use of lighting maps in Quake has become a way to overcome the limitations of Gouraud shading, but in the process of increasing the range of capabilities of graphic cards, the methods of applying multitexturing have also expanded.

3Dfx Voodoo, like many other cards of that era, was limited in the amount of operations that it could perform in a single rendering pass . In fact, a pass is a complete rendering cycle: from processing vertices to rasterizing the frame, and then changing the pixels and writing them to the finished frame buffer. Twenty years ago, games almost always used one-pass rendering.

Nvidia GeForce 2 Ultra, around the end of 2000. Image: Wikimedia

This happened because the second vertex processing just for applying additional textures was too costly in terms of performance. After Voodoo, we had to wait a couple of years when the ATI Radeon and Nvidia GeForce 2 graphics cards appeared, capable of multitexturing in one pass.

These GPUs had several texture units in the pixel processing area (that is, in the pipeline ), so getting a texel with bilinear filtering from two separate textures became the simplest task. This further increased the popularity of lighting maps and allowed games to make them fully dynamic, changing lighting values depending on the conditions of the gaming environment.

But with a few textures, much more could be done, so let's explore their capabilities.

To change the height is normal

In this series of articles about 3D rendering, we did not talk about how the role of the GPU affects the entire process (we will talk about this, but not now!). But if you go back to part 1 and read about the whole complex process of processing vertices, you might think that this is the most difficult part of all the work that the GPU must do.

For a long time it was, and game programmers did everything possible to reduce this load. They had to go to all kinds of tricks to ensure the same image quality as when using multiple vertices, but do not process them.

Most of these tricks used textures called height maps and normal maps.. These two concepts are connected by the fact that the latter can be created from the former, but for now, let's only look at a technique called “bump mapping” .

Images created in a demo rendering by Emil Persson . Embossed texturing is disabled / enabled. Embossed texturing

uses a 2D array called a “height map” that looks like a strange version of the original texture. For example, the image above shows a realistic brick texture overlaid on two flat surfaces. The texture and its height map look like this:

The colors of the height map indicate the normals of the surface of the bricks (we talked about the normals in part 1 of a series of articles). When the rendering process reaches the stage of applying the brick texture to the surface, a series of calculations are performed to change the color of the brick texture based on its normals.

As a result of this, the bricks themselves look more three-dimensional, despite the fact that they continue to remain completely flat. If you look closely, especially at the edges of the bricks, you can see the limitations of this technique: the texture looks slightly distorted. But this is a quick trick that allows you to add more surface detail, so embossed texturing is very popular.

A normal map is similar to a height map, only the texture colors are the normals themselves. In other words, calculations to convert the height map to normal are not required. You can ask a question: how can colors describe a vector in space? The answer is simple: each texel has a set of r, g, b values (red, green, blue) and these values directly correspond to the x, y, z values of the normal vector.

The left diagram shows the change in the direction of the normals on an uneven surface. To describe the same normals with a flat texture (middle outline), we assign them colors. In this case, we used the values r, g, b (0.255.0) for the vector directed straight up, and then increased the value of red for tilt to the left, and blue for tilt to the right.

Keep in mind that this color does not mix with the original pixel, it simply tells the processor in which direction the normal indicates so that it can correctly calculate the angles between the camera, light sources and the textured surface.

The advantages of embossed texturing and normal maps are fully apparent when dynamic lighting is used in the scene, and when the rendering process calculates the effect of the lighting change pixel by pixel, and not for each vertex. Today, modern games use a set of textures to improve the quality of this trick.

Image: Ryan Benno from Twitter

Surprisingly, this realistic-looking wall is just a flat surface, the details of bricks and masonry cement are not made using millions of polygons. Instead, just five textures and the thoughtful use of calculations are enough.

A height map was used to generate shadow casting with bricks, and a normal map was used to simulate all minor surface changes. The roughness texture was used to change the way light is reflected from various elements of the wall (for example, smooth brick reflects light more evenly than rough cement).

The last card, named in the AO image, creates a part of the process called ambient occlusion: we will examine this technique in more detail in the following articles, but for now let's say that it helps to increase the realism of shadows.

Texture mapping is a critical process.

Texturing is absolutely essential when developing games. Take, for example, the 2019 game Kingdom Come: Deliverance , a first-person RPG set in Bohemia in the 15th century. Designers sought to create the most realistic world of that period. And to immerse the player in life that was hundreds of years ago, it is best to implement a historically accurate landscape, buildings, clothes, hairstyles, everyday items, and much more.

Each texture in this image from the game was manually created by artists, and also thanks to a rendering engine controlled by programmers. Some of them are small, with simple details, and therefore are slightly filtered or processed with other textures (for example, chicken wings).

Others have high resolution and many small details; they undergo anisotropic filtering and mixing with normal maps and other textures - just look at the person’s face in the foreground. The difference in the texturing requirements of each scene object is taken into account by programmers.

All this happens today in many games, because players expect ever higher levels of detail and realism. Textures are becoming larger, and more and more of them are superimposed on the surface, but the process of sampling texels and superimposing them on pixels essentially remains the same as in the days of Quake. The best technologies never die, no matter how old they are!

How 3D game rendering works: texturing and texture filtering