Million sprites at 120+ fps

image

If you wander around the DOTS forum , you can find similar posts there about how the author wrote a library capable of rendering a million animated sprites, and still only gets 60fps. I created my own DOTS sprite renderer , which is good enough for our game , but it is not able to cope with a million. I was curious.

So I forked the repository and decided to check if it can be used in Academia. I experimented with him a little, watched how he renders one sprite, a hundred, then thousands. It turned out that he was not quite ready for use in our game. It lacks some aspects, for example, sorting sprites from back to front. I tried to write a hack of this function. When I read the code, I realized that it might be worth writing a completely new library that we can use. I just needed to figure out how it renders sprites, but I already understood the principle.

The basics


If I want to recreate this rendering technique, then I need to do the simplest thing: render a separate sprite. The library uses ComputeBuffers. They must transfer computation to the GPU using computational shaders. I didn’t know what could be used in a regular shader that renders something on the screen. You can perceive them as arrays of numbers that can be assigned to materials, after which the shader accesses these materials. Therefore, you can transfer data such as position, rotation, scale, uv-coordinates, colors - whatever you want. Below is a shader modified based on this awesome library:

  Shader "Instanced/ComputeBufferSprite" {
    Properties {
        _MainTex ("Albedo (RGB)", 2D) = "white" {}
    }
    
    SubShader {
        Tags{
            "Queue"="Transparent"
            "IgnoreProjector"="True"
            "RenderType"="Transparent"
        }
        Cull Back
        Lighting Off
        ZWrite On
        Blend One OneMinusSrcAlpha
        Pass {
            CGPROGRAM
            // Upgrade NOTE: excluded shader from OpenGL ES 2.0 because it uses non-square matrices
            #pragma exclude_renderers gles

            #pragma vertex vert
            #pragma fragment frag
            #pragma target 4.5

            #include "UnityCG.cginc"

            sampler2D _MainTex;

            // xy for position, z for rotation, and w for scale
            StructuredBuffer<float4> transformBuffer;

            // xy is the uv size, zw is the uv offset/coordinate
            StructuredBuffer<float4> uvBuffer; 

	        StructuredBuffer<float4> colorsBuffer;

            struct v2f{
                float4 pos : SV_POSITION;
                float2 uv: TEXCOORD0;
		        fixed4 color : COLOR0;
            };

            float4x4 rotationZMatrix(float zRotRadians) {
                float c = cos(zRotRadians);
                float s = sin(zRotRadians);
                float4x4 ZMatrix  = 
                    float4x4( 
                       c,  -s, 0,  0,a
                       s,  c,  0,  0,
                       0,  0,  1,  0,
                       0,  0,  0,  1);
                return ZMatrix;
            }

            v2f vert (appdata_full v, uint instanceID : SV_InstanceID) {
                float4 transform = transformBuffer[instanceID];
                float4 uv = uvBuffer[instanceID];
                
                //rotate the vertex
                v.vertex = mul(v.vertex - float4(0.5, 0.5, 0,0), rotationZMatrix(transform.z));
                
                //scale it
                float3 worldPosition = float3(transform.x, transform.y, -transform.y/10) + (v.vertex.xyz * transform.w);
                
                v2f o;
                o.pos = UnityObjectToClipPos(float4(worldPosition, 1.0f));
                
                // XY here is the dimension (width, height). 
                // ZW is the offset in the texture (the actual UV coordinates)
                o.uv =  v.texcoord * uv.xy + uv.zw;
                
		        o.color = colorsBuffer[instanceID];
                return o;
            }

            fixed4 frag (v2f i) : SV_Target{
                fixed4 col = tex2D(_MainTex, i.uv) * i.color;
				clip(col.a - 1.0 / 255.0);
                col.rgb *= col.a;

				return col;
            }

            ENDCG
        }
    }
}

The variables variables transformBuffer, uvBuffer and colorsBuffer are β€œarrays” that we define in the code using ComputeBuffers. This is all we need (for now) to render the sprite. Here is the MonoBehaviour script for rendering a single sprite:

public class ComputeBufferBasic : MonoBehaviour {
    [SerializeField]
    private Material material;

    private Mesh mesh;
    
    // Transform here is a compressed transform information
    // xy is the position, z is rotation, w is the scale
    private ComputeBuffer transformBuffer;
    
    // uvBuffer contains float4 values in which xy is the uv dimension and zw is the texture offset
    private ComputeBuffer uvBuffer;
    private ComputeBuffer colorBuffer;

    private readonly uint[] args = {
        6, 1, 0, 0, 0
    };
    
    private ComputeBuffer argsBuffer;

    private void Awake() {
        this.mesh = CreateQuad();
        
        this.transformBuffer = new ComputeBuffer(1, 16);
        float scale = 0.2f;
        this.transformBuffer.SetData(new float4[]{ new float4(0, 0, 0, scale) });
        int matrixBufferId = Shader.PropertyToID("transformBuffer");
        this.material.SetBuffer(matrixBufferId, this.transformBuffer);
        
        this.uvBuffer = new ComputeBuffer(1, 16);
        this.uvBuffer.SetData(new float4[]{ new float4(0.25f, 0.25f, 0, 0) });
        int uvBufferId = Shader.PropertyToID("uvBuffer");
        this.material.SetBuffer(uvBufferId, this.uvBuffer);
        
        this.colorBuffer = new ComputeBuffer(1, 16);
        this.colorBuffer.SetData(new float4[]{ new float4(1, 1, 1, 1) });
        int colorsBufferId = Shader.PropertyToID("colorsBuffer");
        this.material.SetBuffer(colorsBufferId, this.colorBuffer);

        this.argsBuffer = new ComputeBuffer(1, this.args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
        this.argsBuffer.SetData(this.args);
    }

    private static readonly Bounds BOUNDS = new Bounds(Vector2.zero, Vector3.one);

    private void Update() {   
        // Draw
        Graphics.DrawMeshInstancedIndirect(this.mesh, 0, this.material, BOUNDS, this.argsBuffer);
    }
    
    // This can be refactored to a utility class
    // Just added it here for the article
    private static Mesh CreateQuad() {
        Mesh mesh = new Mesh();
        Vector3[] vertices = new Vector3[4];
        vertices[0] = new Vector3(0, 0, 0);
        vertices[1] = new Vector3(1, 0, 0);
        vertices[2] = new Vector3(0, 1, 0);
        vertices[3] = new Vector3(1, 1, 0);
        mesh.vertices = vertices;

        int[] tri = new int[6];
        tri[0] = 0;
        tri[1] = 2;
        tri[2] = 1;
        tri[3] = 2;
        tri[4] = 3;
        tri[5] = 1;
        mesh.triangles = tri;

        Vector3[] normals = new Vector3[4];
        normals[0] = -Vector3.forward;
        normals[1] = -Vector3.forward;
        normals[2] = -Vector3.forward;
        normals[3] = -Vector3.forward;
        mesh.normals = normals;

        Vector2[] uv = new Vector2[4];
        uv[0] = new Vector2(0, 0);
        uv[1] = new Vector2(1, 0);
        uv[2] = new Vector2(0, 1);
        uv[3] = new Vector2(1, 1);
        mesh.uv = uv;

        return mesh;
    }
}

Let's take this code in order. For the material, we need to create a new material, and then set the shader described above for it. Assign it a texture / sprite sheet. I use a sprite sheet from the library, which is a 4x4 sprite emoji icon.


The mesh here is the mesh created by CreateQuad (). It is just a quadrangle made up of two triangles. Next are the three ComputeBuffer variables, which we will later define the material for. I named them the same way as StructuredBuffer variables in the shader. This is not necessary, but it’s more convenient.

The args and argsBuffer variables will be used to call Graphics.DrawMeshInstancedIndirect (). The documentation is here . A function requires a buffer with five uint values. In our case, only the first two are important. The first is the number of indices, and for our quadrilateral it is 6. The second is the number of times the quadrilateral will be rendered, that is just 1. I also represent it as the maximum value used by the shader to index StructuredBuffer. Like that:

for(int i = 0; i < count; ++i) {
    CallShaderUsingThisIndexForBuffers(i);
}

The Awake () method is just preparing ComputeBuffers to assign material. We render the sprite at the point (0, 0) with a scale of 0.2f and without rotation. For UV, we use the sprite in the lower left corner (kiss emoji). Then we assign white color. The args array is set to argsBuffer.

In Update (), we simply call Graphics.DrawMeshInstancedIndirect (). (I don’t quite understand yet how to use BOUNDS here and just copied it from the library.)

The final steps will be to prepare a scene with an orthogonal camera. Create another GameObject and add the ComputeBufferBasic component. Let's set him material using the shader just shown. At startup, we get the following:


Oh yeah! A sprite rendered using ComputeBuffer.

If you can do one, you can do a lot


Now that we have learned how to render one sprite using ComputeBuffers, we can draw a lot. Here is another script I created that has a quantity parameter and renders the specified number of sprites with a random position, scale, rotation and color:

public class ComputeBufferMultipleSprites : MonoBehaviour {
    [SerializeField]
    private Material material;
    
    [SerializeField]
    private float minScale = 0.15f;
    
    [SerializeField]
    private float maxScale = 0.2f;  

    [SerializeField]
    private int count;

    private Mesh mesh;
    
    // Matrix here is a compressed transform information
    // xy is the position, z is rotation, w is the scale
    private ComputeBuffer transformBuffer;
    
    // uvBuffer contains float4 values in which xy is the uv dimension and zw is the texture offset
    private ComputeBuffer uvBuffer;
    private ComputeBuffer colorBuffer;

    private uint[] args;
    
    private ComputeBuffer argsBuffer;

    private void Awake() {
        QualitySettings.vSyncCount = 0;
        Application.targetFrameRate = -1;
        
        this.mesh = CreateQuad();
        
        // Prepare values
        float4[] transforms = new float4[this.count];
        float4[] uvs = new float4[this.count];
        float4[] colors = new float4[this.count];

        const float maxRotation = Mathf.PI * 2;
        for (int i = 0; i < this.count; ++i) {
            // transform
            float x = UnityEngine.Random.Range(-8f, 8f);
            float y = UnityEngine.Random.Range(-4.0f, 4.0f);
            float rotation = UnityEngine.Random.Range(0, maxRotation);
            float scale = UnityEngine.Random.Range(this.minScale, this.maxScale);
            transforms[i] = new float4(x, y, rotation, scale);
            
            // UV
            float u = UnityEngine.Random.Range(0, 4) * 0.25f;
            float v = UnityEngine.Random.Range(0, 4) * 0.25f;
            uvs[i] = new float4(0.25f, 0.25f, u, v);
            
            // color
            float r = UnityEngine.Random.Range(0f, 1.0f);
            float g = UnityEngine.Random.Range(0f, 1.0f);
            float b = UnityEngine.Random.Range(0f, 1.0f);
            colors[i] = new float4(r, g, b, 1.0f);
        }
        
        this.transformBuffer = new ComputeBuffer(this.count, 16);
        this.transformBuffer.SetData(transforms);
        int matrixBufferId = Shader.PropertyToID("transformBuffer");
        this.material.SetBuffer(matrixBufferId, this.transformBuffer);
        
        this.uvBuffer = new ComputeBuffer(this.count, 16);
        this.uvBuffer.SetData(uvs);
        int uvBufferId = Shader.PropertyToID("uvBuffer");
        this.material.SetBuffer(uvBufferId, this.uvBuffer);
        
        this.colorBuffer = new ComputeBuffer(this.count, 16);
        this.colorBuffer.SetData(colors);
        int colorsBufferId = Shader.PropertyToID("colorsBuffer");
        this.material.SetBuffer(colorsBufferId, this.colorBuffer);

        this.args = new uint[] {
            6, (uint)this.count, 0, 0, 0
        };
        this.argsBuffer = new ComputeBuffer(1, this.args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
        this.argsBuffer.SetData(this.args);
    }

    private static readonly Bounds BOUNDS = new Bounds(Vector2.zero, Vector3.one);

    private void Update() {   
        // Draw
        Graphics.DrawMeshInstancedIndirect(this.mesh, 0, this.material, BOUNDS, this.argsBuffer);
    }

    private static Mesh CreateQuad() {
        // Just the same as previous code. I told you this can be refactored.
    }
}

There are practically no changes compared to rendering a single sprite. The difference is that now we are preparing arrays with X content specified by the serialized variable count . We also set the second number in the args array, setting it to count .

Using this script, we can set count to any value, and it will generate the specified number of sprites, but it will render them in just one draw call.


Here are 10,000 random sprites.

Why are minScale and maxScale serialized variables? When I tested the code with 600,000 sprites, I noticed that the speed dropped below 60fps. If the source library is capable of a million, then why does this code fail?


This is 600,000 sprites. It works slowly.

I suggested that perhaps this is due to redrawing. So I made minScale and maxScale serialized parameters and set small numbers like 0.01 and 0.02. And only then I was able to recreate a million sprites at more than 60fps (judging by the profiler of the editor). Perhaps the code is capable of more, but who needs a million sprites? In our game, not a quarter of this number is required.


A million little sprites.

Profiler


So, I wanted to see how this code works in a test build. Features of my car: 3.7 GHz (4 cores), 16 GB of RAM, Radeon RX 460. Here is what I got:


As you can see, everything is pretty fast. The call to Graphics.DrawMeshInstancedIndirect () shows 0 ms. Although I'm not so sure whether to worry about Gfx.PresentFrame.


Not so fast


Although the result is impressive, in a real game the code will be used in a different way. The most important missing aspect is the sorting of sprites. And it will take up most of the CPU resources. In addition, with moving sprites, ComputeBuffers will need to be updated in every frame. There is still a lot of work left. I do not expect that it will be possible to achieve one million in a real working framework, but if I achieve something like 300,000 in less than 2 ms, then this will be quite enough for me. DOTS will definitely help with this, but this is a topic for another article.

All Articles