🚴 😧 🍭 3D game rendering: introduction 🌽 🌻 ➡️

You play the fresh Call of Mario: Deathduty Battleyard on your perfect gaming PC. Look at the beautiful, super-wide 4K monitor, admiring the magnificent scenery and intricate details. Ever wondered how graphics get to the screen? Have you ever wondered how the game makes the computer show you all this?

Welcome to our 3D game rendering tour: a journey for beginners, from which you will learn how to create one basic frame on the screen.

Every year, hundreds of new games for smartphones, consoles and PCs are released. The variety of formats and genres is very large, but one of them, perhaps, is mastered best - these are 3D games. Which game was the first - a debatable question, and a quick look at the base of the Guinness Book of Records gave several results. It can be considered the first Ultimate Knight Lore game, released in 1984, but strictly speaking, the images in this game were two-dimensional - no information was used in full three dimensions.

So if we really want to understand how modern 3D games form an image, then we have to start with another example: Winning RunNamco, released in 1988. Perhaps this was the first completely three-dimensional game using technologies that are not too different from modern ones. Of course, any games whose age has exceeded 30 years are not at all the same as, say, Codemasters F1, released in 2018. But the circuit diagrams are similar.

In this article, we will look at the process of generating a basic image for a monitor or TV using a 3D game. Let's start with the final result and ask ourselves: “What am I looking at?”

Then we analyze each stage of the formation of the picture that we see. In the course of the action, we will consider such concepts as vertices and pixels, textures and passes, buffers and shading, software and instructions. We will find out how the video card is involved in the process and why it is needed at all. After that, you can look at your games and PCs in a new light and begin to appreciate video graphics more.

Frame options: pixels and colors

Let's start the 3D game, as a sample we will take Crytek company released in 2007 Crysis . Below is a photograph of the display on which the game is displayed:

This picture is usually called a frame . But what exactly are we looking at? We use a macro lens:

Unfortunately, the glare and the external backlight of the monitor spoil the photo, but if we improve it a bit, we get

We see that the frame in the monitor consists of a grid of separate colored elements, and if we increase them even more, we will notice that each element is a block of three pieces. Such a block is called a pixel (pixel, short for picture element). In most monitors, pixels are painted using three colors: red, green and blue (RGB, red-green-blue). In order to display a new frame, you need to process a list of thousands, if not millions, of RGB values and save them in a piece of memory to which the monitor has access. Such fragments of memory are called buffers , that is, the monitor receives the contents of the frame buffer .

This is the final point of the whole process, so now we will move in the opposite direction to its beginning. The process is often described by the term rendering (rendering), but in reality it is a sequence of separate steps connected with each other, which in essence differ greatly. Imagine that you are a chef in a Michelin-starred restaurant: the end result is a plate of delicious food, but how much needs to be done to achieve this. As with cooking, basic ingredients are needed for rendering.

Necessary building elements: models and textures

The main building blocks of any 3D game are visual resources that fill the world to be drawn. Films, TV shows and theaters need actors, costumes, props, sets, lighting - the list is quite long. The same thing with 3D games. Everything that you see in the generated frame was developed by artists and modeling experts. To make it clearer, let's turn to the old school and take a look at the id Software Quake II model:

The game came out more than 20 years ago. At that time, Quake II was a technological masterpiece, although, like in any game of those years, the models here look very primitive. But it’s easy to demonstrate what they consist of.

In the previous picture, we see an angular dude consisting of triangles connected to each other. Each corner is called a vertex, or vertex. Each vertex acts as a point in space and is described by at least three numbers: the coordinates x, y, z . However, this is not enough for a 3D game, so each vertex has additional meanings: color, direction of the front side (yes, a point cannot have a front side ... just read on!), Brightness, degree of transparency, etc.

Vertexes always have a set of values associated with texture maps. These are pictures of the “clothes” worn by the model. But since the image is flat, the map should contain a view from any direction from where we can look at the model. The Quake II example illustrates a simple approach: images from the front, back, and sides (arms). And modern 3D-games already operate for models with numerous texture maps, each of which contains many details, without any empty spaces between them. Some maps do not look like materials or properties; instead, they provide information on how light is reflected from the surface. Each vertex has a set of coordinates in the texture map associated with the model, so that it can be “stretched” over the vertex. This means that when you move the vertex, the texture will move with it.

In the rendered three-dimensional world, everything that you see begins with a set of vertices and texture maps. They are loaded into memory buffers connected to each other. Vertex buffer (vertex buffer) contains textures and portions of memory allocated for subsequent rendering. The command buffer contains a list of instructions on what to do with these resources.

All this forms the necessary framework that will be used to create the final grid of colored pixels. In some games, this is a huge amount of data, because it takes too long to recreate the buffers for each new frame. Games also store in buffers the information necessary to form a whole world that a player can see, or a large enough part of it, updating as necessary. For example, in a racing game like F1 2018, everything will be stored in one large collection of buffers. And in an open-world game like Skyrim, data will be loaded into buffers and deleted from them as the camera moves around the world.

Scene Setup: Vertexes

Having all the visual information, the game will then begin the process of visual display. The scene begins in a certain position by default, with the base arrangement of models, light sources, etc. This will be the “zero” screen - the starting point for graphics. It is often not displayed, it is simply processed by the system. To illustrate what happens in the first stage of rendering, we will use the online tool Real-Time Rendering . Let's create the simplest “game”: a box standing on the ground.

This object contains 8 vertices, each of which is described by a list of numbers. The vertices form a model consisting of 12 triangles. Each triangle, and even the object itself, is called a primitive . As the primitives move, rotate, and scale, the numbers go through chains of mathematical operations and are updated.

Note that model point numbers do not change. These numbers show exactly where the model is located in the virtual world. The consideration of the corresponding mathematical calculations is beyond the scope of the article, we will only say that first all the objects are placed where they should be. And then coloring begins.

Take another model that has 10 times more vertices than the previous box. In the simplest painting process, the color of each vertex is taken, and then color changes from surface to surface are calculated. This is called interpolation .

Increasing the number of vertices in the model not only allows you to create more realistic objects, but also improves the result of color interpolation.

At this stage of rendering, the effect of light sources in the scene can be calculated in detail. For example, how model materials reflect color. Such calculations should take into account the position and direction of the camera, as well as the position and direction of the light sources.

To do this, there are a number of different mathematical methods, some simple, others very complex. In the illustration above, we see that the object on the right looks much nicer and more realistic, but more work is required to draw it.

It is important to note that now we compare objects with a small number of peaks with the most modern games. Scroll up and carefully examine the image from Crysis: this one scene displays more than a million triangles. Using the Unigine Valley benchmark as an example, you can understand how many triangles are used in modern games.

Each object in this image consists of vertices connected to each other, which are formed by the primitives consisting of triangles. The benchmark can be run in wireframe mode, in which the edges of each triangle are indicated by white lines.

As you can see, any object consists of triangles, and for each triangle the location, direction and color are calculated. This takes into account the location of the light sources, as well as the location and direction of the camera. All changes associated with the vertices must be transferred to the game so that it has all the necessary information to draw the next frame - this is done by updating the vertex buffer.

Surprisingly, this is not the hardest part of rendering, with the right hardware, all calculations are done in a few thousandths of a second! Move on.

Losing Dimension: Rasterization

After processing all the vertices and completing the placement of all the objects in our three-dimensional scene, the rendering process proceeds to a very important stage. Up to this point, the game was truly three-dimensional, but the final frame is no longer such: in the course of a series of changes, the viewed world is transformed from 3D-space consisting of thousands of connected points into a two-dimensional image consisting of colored pixels. In most games, this procedure consists of at least two phases: projection screen space (screen space projection) and rasterization .

Back to our web rendering tool, it will show us how the volume of the virtual world turns into a flat image. The camera is shown on the left, the lines emanating from it create a truncated pyramid of visibility (frustum), and everything that gets into it can be displayed in the final frame. The perpendicular section of the pyramid is called the viewport - this is what will be shown on the monitor. Numerous mathematical calculations are used to project the entire contents of the pyramid into the viewing area, taking into account the perspective of the camera.

Although the graphics in the viewing area are two-dimensional, the data is still truly three-dimensional, and later this information will be used to calculate which primitives are visible to us and which are hidden. This can be surprisingly difficult to do, because primitives can cast shadows visible to us, even if the primitives themselves are hidden from us. Remove hidden from us the primitives called the dropping (culling). This operation can significantly affect the rendering speed of the entire frame. After the sorting into visible and hidden primitives is completed, as well as triangles are removed outside the limits of the visibility pyramid, the last stage of three-dimensionality is completed and the frame is completely two-dimensional using rasterization.

The illustration above shows a very simple example of a frame containing one primitive. The pixel grid is superimposed on the geometric shape and the corresponding pixels are marked for subsequent processing. The end result does not look too much like the original triangle, because we are not using enough pixels. In this regard, the problem of aliasing (aliasing, stepping lines) arises , which is solved in various ways. Therefore, a change in the game’s resolution (the total number of pixels in the frame) affects the final result so much: more pixels not only improves the display of forms, but also reduces the effect of unwanted aliasing.

After completing this part of the rendering, we move on to the next big step: the final coloring of all the pixels in the frame.

Carry the light: pixel stage

We have come to the most difficult stage of rendering. Once it came down to pulling on clothes (textures) models using pixel information (originally obtained from the vertices). However, the fact is that although the textures and the frame itself are two-dimensional, however, the virtual world at the vertex processing stage was distorted, shifted and changed. To account for all this, additional mathematical calculations are used, however, new problems may be characteristic of the result.

In this illustration, a checkerboard texture is applied to the plane. There is an unpleasant visual ripple, which is exacerbated by aliasing. To solve this problem, use smaller versions of texture maps ( multiple mappings , mipmaps), reuse of information from these textures ( filtering , filtering) and additional mathematical calculations. The effect is noticeable:

For any game, this was really a difficult stage, but today it is no longer so, because due to the wide use of other visual effects, such as reflections and shadows, texture processing has become a relatively small stage of the rendering process. When playing at high resolutions, the load at the stages of rasterization and pixel processing increases, but this affects the processing of vertices relatively little. Although the primary coloring due to light sources is performed at the vertex stage, more sophisticated lighting effects can be applied.

In the previous illustration, we no longer see color changes between different triangles, which gives us the feeling of a smooth seamless object. Although in this example the sphere consists of the same number of triangles as the green sphere in the illustration above, as a result of the pixel coloring procedure, it seems to us that a lot more triangles are used.

In many games, the pixel phase has to be run several times. For example, in order for a mirror or surface of the water to reflect the surrounding world, this world must first be drawn. Each run is called pass (pass), and to obtain the final image for each frame can be used easily four or more passes.

Also, sometimes you need to run the vertex stage again to redraw the world from another point and use this image in a scene that is shown to the player. To do this, use single-buffer rendering (render targets) - use buffers that act as the final storage for the frame, but can also act as textures with a different pass.

To assess the complexity of the pixel stage, you can readframe analysis in Doom 2016 . You will be shocked by the number of operations that are needed to create one frame.

All the work done to create the frame must be saved to the buffer, whether it be the final or intermediate result. In general, the game uses on the fly at least two buffers for final display: one for "current work", and the second buffer is either waiting for a monitor to be accessed, or is in the process of displaying. You always need a screen buffer in which the rendering result will be saved, and when all the buffers are full, you need to move on and create a new buffer. Upon completion of the work with the frame, a simple command is given, the final frame buffers are interchanged, the monitor receives the last rendered frame, and the process of rendering the next frame starts.

In this frame from Assassin's Creed Odyssey we see the contents of the completed frame buffer. This content can be represented in a table that contains only numbers. They are sent in the form of electrical signals to a monitor or TV, and the pixels of the screen change their values. Our eyes see a flat, solid image, but our brain interprets it as three-dimensional. So much work is hidden behind the scenes of just one shot in the game that it’s worth taking a look at how programmers can handle it.

Process Management: APIs and Instructions

To figure out how to make the game execute and manage all the calculations, vertices, textures, lighting, buffers, etc. - this is a huge task. Fortunately, help us in this programming interfaces (application programming interface, API).

Rendering APIs reduce overall complexity by offering structures, rules, and software libraries that allow simplified instructions that are hardware-independent. Take any 3D game released for PC over the past three years: it was created using one of three popular APIs - Direct3D, OpenGL, or Vulkan. There are other similar developments, especially in the mobile segment, but in this article we will talk about the three mentioned.

Despite the differences in the names of instructions and operations (for example, a code block for processing pixels in DirectX is called a pixel shader, and in Vulkan it is called a fragment shader), the final result does not differ, more precisely, it should not differ.

The difference will be manifested in what equipment is used for rendering. The instructions generated by the API need to be converted into hardware-friendly commands that are processed by the device drivers. And equipment manufacturers have to spend a lot of resources and time for their drivers to perform this conversion as quickly and correctly as possible.

For example, the early beta of The Talos Principle (2014) supported all three of the mentioned APIs. To demonstrate how the results of different combinations of drivers and interfaces can differ, we ran the standard built-in benchmark, setting the resolution to 1080p and maximum quality settings. The Intel Core i7-9700K processor worked without overclocking, the Nvidia Titan X (Pascal) video card, RAM - 32 GB DDR4 RAM.

DirectX 9 = an average of 188.4 frames / s.
DirectX 11 = an average of 202.3 frames / s.
OpenGL = average 87.9 frames / s.
Vulkan = an average of 189.4 frames / s.

We will not analyze the results, and they certainly do not say that some API is “better” than the other (do not forget, the beta version of the game was tested). We will only say that programming for different APIs is associated with various difficulties, and at any given time the performance will also be different. In general, game developers choose the API with which they have more experience, and optimize their code for it. Sometimes the term engine is used to describe the code responsible for rendering, but strictly speaking, the engine is a complete set of tools that processes all aspects of the game, not just graphics.

It's not so easy to create a program from scratch that renders a 3D game. Therefore, today in so many games licensed third-party systems are used (for example, Unreal Engine) To evaluate their complexity, open the open source engine for Quake and view the gl_draw.c file: it contains instructions for different rendering operations and reflects only a small part of the entire engine. But Quake was released over 20 years ago, and the whole game (including all visual resources, sounds, music, etc.) takes 55 MB. For comparison, in Far Cry 5, only shaders occupy 62 MB.

Time is most important: using the right equipment

All of the above can be calculated and processed by the processor of any computer system. Modern processors of the x86-64 family support all the necessary mathematical operations and even contain separate subsystems for this. However, the task of rendering a single frame requires numerous repeated calculations and significant parallelization of work. Central processors are not adapted for this, because they are created to solve the widest possible range of tasks. Specialized processors for graphics computing are called GPUs (graphics processing units). They are created in order to DirectX, OpenGL and Vulkan.

We will use a benchmark that allows you to render a frame using a central processor or specialized equipment - V-ray NEXTChaos Group companies. In fact, it does ray tracing rather than rendering, but most of the numerical operations here are also hardware dependent.

Let's pass the benchmark in three modes: only the central processor, only the graphic processor, and a combination of both processors:

CPU alone = 53 million rays
GPU alone = 251 million rays
The combination of both processors = 299 million rays

The unit of measurement can be ignored, the essence is five times the difference. But still, this test is not too related to games, so let's turn to the old-school benchmark Futuremark 3DMark03 . Let's run a simple Wings of Fury test with a forced calculation of all vertex shaders (that is, with a full set of operations for moving and coloring triangles) using the central processor.

The result should not surprise you:

Central processor = average 77 frames / s.
GPU = average 1,580 frames / s.

When all the calculations with the vertexes are performed by the central processor, it takes an average of 13 ms to render and display each frame. And when using a graphics processor, this figure drops to 0.6 ms - more than 20 times faster.

The difference increases even more if you run the most difficult benchmark test - Mother Nature. The central processor produced an insignificant 3.1 frames / s! And the graphics processor shot up at 1388 frames / s.: Almost 450 times faster. Please note: 3DMark03 came out 16 back, and in the test on the central processor only the vertices are processed, the graphics processor still takes on the rasterization and pixel stage. Imagine if the benchmark were modern and most of the operations were performed programmatically?

Now we’ll try the Unigine Valley benchmark again , the graphics it processes are very similar to those used in games like Far Cry 5. There is also a fully software rendering engine in addition to the standard DirectX 11. When running on a video processor, we got an average result of 196 frames / s . What about the software version? After a couple of crashes, a powerful test PC generated an average of 0.1 frames / s. - almost two thousand times slower.

The reason for such a big difference lies in mathematical calculations and the data format used in 3D rendering. Each core of the CPU is equipped with floating point modules. The i7-9700K contains 8 cores, each with two such modules. Although the architecture of the modules in Titan X is different, both types can perform the same calculations with data of the same format. This video card has over 3,500 modules for performing comparable calculations, and although their clock frequency is much lower than in the central processor (1.5 GHz and 4.7 GHz), however, the video processor takes the number of modules.

Although the Titan X is not a mass graphics card, even a budget model will overtake any central processor. Therefore, all 3D games and APIs are designed for specialized equipment. You can download V-ray , 3DMark or any Unigine benchmark and test your system - see for yourself how well the video processors are adapted for rendering graphics in games.

Final words

It was a short digression into the process of creating a single frame in 3D games, from a point in space to a colorful image on a monitor.

In fact, the whole process is just working with numbers. However, a lot has remained outside the scope of the article. We did not consider specific mathematical calculations from Euclidean linear algebra, trigonometry, and differential calculi performed by vertex and pixel shaders. We also did not talk about how textures are processed using statistical sampling. I omitted such cool visual effects as blocking ambient light in the screen space, reducing interference with ray tracing, using the extended dynamic range and temporal smoothing.

And the next time you launch a modern 3D game, we hope you not only look at the graphics with different eyes, but also want to know more about it.

3D game rendering: introduction