Pixockets: how we wrote our own network library for the game server



Hello! Connected Stanislav Yablonsky, Lead Server Developer of Pixonic.

When I first came to Pixonic, our game servers were applications based on the Photon Realtime SDK : a multifunctional, but very heavy framework. This solution, it would seem, was to simplify the work with the server. So it was - until a certain point.

Photon Realtime tied us to itself by having to use it to exchange data between players and the server - and also tied it to Windows, since it can only work on it. This imposed restrictions on us both from the point of view of runtime (runtime): it was impossible to change many important settings of the .NET virtual machine, and the operating system. We are used to working with Linux servers, not Windows. In addition, they cost us less.

Also, the use of Photon hit performance both on the server and on the client, and when profiling, a decent load on the garbage collector and a large amount of boxing / unboxing formed.

In short, the solution with Photon Realtime was far from optimal for us, and for a long time it was necessary to do something with it - but there were always more urgent tasks, and hands did not reach the solution of problems with the server.

Since it was interesting for me not only to solve the problem, but also to better understand the network, I decided to take the initiative in my own hands and try to write a library myself. But, you understand, at home - at home, at work - work, as a result, the time to develop the library was only in transport. However, this did not stop the idea from coming to fruition.

What came of it - read on.

Library ideology


Since we are developing online games, it is very important for us to work without pauses, so low overheads have become the main requirement for the library. For us, this is, above all, a low load on the garbage collector. To achieve it, I tried to avoid allocations, and in cases where it was difficult to achieve or did not work out at all, we made pools (for byte buffers, connection states, headers, etc.).

For simplicity and convenience of support and assembly, we began to use only C # and system sockets. In addition, it was important to fit into the time budget per frame, because the data from the server should have arrived on time. Therefore, I tried to reduce the execution time, even at the cost of some non-optimality: that is, in some places it was worth replacing the fast and partly more complex algorithms and data structures with simpler and more predictable ones. For example, we did not use lock-free queues, since they created a load on the garbage collector.

Typically for multiplayer shooters, our data is sent via UDP. Still on top of it was added fragmentation and assembly of packets for sending data of a larger size than the frame size, as well as reliable delivery due to forwarding and establishing a connection.

The UDP frame in our library defaults to 1200 bytes. Packets of this size should be transmitted in modern networks with a fairly low risk of fragmentation, since the MTU in most modern networks is higher than this value. At the same time, usually this amount is enough to fit the changes that need to be sent to the player after the next tick (status update) in the game.

Architecture


In our library we use a two-layer socket:

  • The first layer is responsible for working with system calls and provides a more convenient API for the next level;
  • The second layer is work directly with the session, fragmentation / assembly of packets, their forwarding, etc.



The class for working with connection, in turn, is also divided into two levels:

  • The lower level (SockBase) is responsible for sending and receiving data over UDP. It is a thin wrapper over a socket system object.
  • Top Level (SmartSock) provides additional functionality over UDP. Cutting and gluing packages, forwarding data that has not reached, rejection of duplicates - all this is his area of ​​responsibility.

The lower level is divided into two classes: BareSock and ThreadSock.

  • BareSock works in the same thread where the call originated, sending and receiving data in non-blocking mode.
  • ThreadSock puts packets in queues and thus creates separate threads for sending and receiving data. When accessing it, there is only one operation: adding or removing data from the queue.

BareSock is often used to work with the client, ThreadSock - with the server.

Features of work


I also wrote two types of low-level sockets:

  • The first is synchronous single-threaded. In it, we get the minimum overhead for memory and the processor, but at the same time system calls occur directly when accessing the socket. This minimizes overhead in general (no need to use queues and additional buffers), but the call itself may take longer than taking an item from the queue.
  • The second is asynchronous with separate threads for reading and writing. In this case, we get additional overhead for the queue, synchronization, and sending / receiving time (within a few milliseconds), since at the time of access to the socket, the read or write thread is paused.

We also tried using SocketAsyncEventArgs - perhaps the most advanced networking API in .NET that I know of. But it turned out that it probably doesn’t work for UDP: the TCP stack through it works fine, but UDP gives errors about getting strangely clipped frames and even crashing inside .NET - as if memory in the native part of the virtual machine were corrupted. I did not find examples of the operation of such a scheme.

Another important feature of our library is reduced data loss. We got the impression that in order to get rid of duplicates, many libraries discard old data packages, as we later saw from our own experience. Of course, such an implementation is much simpler, because in its case one counter with the number of the last frame arrived is enough, but it didn’t suit us very much. Therefore, Pixockets uses a circular buffer from the numbers of the last frames to filter out duplicates: newly arrived numbers are overwritten instead of the old ones, and duplicates are searched for among the last received frames.



Thus, if a packet was sent before the current frame, but came after, it will still reach the destination. This can greatly help, for example, in the case of position interpolation. In this case, we will have a more complete story.

Data packet structure


The data in the library is transmitted as follows:



At the beginning of the package is the header:

  • It starts with the size of the packet, which in turn is limited to 64 kilobytes.
  • The size is followed by a byte with flags. The interpretation of the rest of the title depends on their availability.
  • Next is the identifier for the session or connection.

With the appropriate flags, then we get:

  • If the flag with the packet number in turn is set, the packet number is transmitted after the session identifier.
  • Following him - also in the case of the flag set - the number of confirmed packets and their numbers.

At the end of the header is information about the fragment:

  • identifier of the sequence of fragments, which is necessary in order to distinguish fragments of different messages;
  • sequence number of the fragment;
  • total number of fragments in the message.

Information about the fragment also requires setting the corresponding flag.

The library is written. What's next?


In order to have more accurate synchronous connection information, we later organized an explicit connection. This helped us to clearly understand situations when one side thinks that the connection is established and not interrupted, and the other - that it was interrupted.

In the first version of Pixockets, this was not: the client did not need to call the Connect (host, port) method - it just started sending data to a known address and port. Then the server called the Listen (port) method and began to receive data from a specific address. Session data was initialized upon receipt / transmission of the packet.

Now, to establish a connection, a “handshake” has become necessary — the exchange of specially formed packets — and the client must call Connect.

In addition, one of my colleagues forked the library, paying more attention to network security, and also adding some features, such as the ability to reconnect directly inside the socket: for example, when switching between Wi-Fi and 4G, the connection is now restored automatically. But we will talk about this later.

Testing


Of course, we wrote unit tests for the library: they check all the main ways to establish a connection, send and receive data, fragmentation and assembly of packets, various anomalies in sending and receiving data - such as duplication, loss, mismatch in the order of sending and receiving. For the initial performance check, I wrote special test applications for integration testing: a ping client, a ping server and an application that synchronizes the position, color and number of colored circles on the screen over the network.

After the test applications proved the functionality of our library, we started comparing it with other libraries: with our old Photon Realtime and with the UDP library LiteNetLib 0.7.

We tested a simplified version of a game server that simply collects input from players and sends back the “glued” result. We took 500 players in rooms of 6 people, the refresh rate is 30 times per second.



The load on the garbage collector and processor consumption turned out to be lower in the case of Pixockets, as well as the percentage of missing packets - apparently due to the fact that, unlike other versions of UDP, we do not ignore late packets.

After we received confirmation of the advantage of our solution in synthetic tests, the next step was to run the library on a real project.

At that time, in the project we selected, clients and game servers synchronized through Photon Server. I added Pixockets support to the client and server, making it possible to control the choice of protocol from the matchmaking server - the one to which the clients send a request to enter the game.

For some period, clients played simultaneously on both protocols, and at that time we collected statistics on how they were doing. At the end of statistics collection, it turned out that the results do not differ from synthetic tests: the load on the garbage collector and the processor has decreased, packet loss, too. At the same time, ping became a little lower. Therefore, the next version of the game has already been released completely on Pixockets without using the Photon Realtime SDK.



Future plans


Now we want to implement the following features in the library:

  • Simplified connection: now it does not work optimally, and after calling Connect on the client, you need to call Read until the connection status changes;
  • Explicit shutdown: at the moment, shutdown on the other side occurs only by timer;
  • Built-in pings to maintain connectivity;
  • Automatic determination of the optimal frame size (now just a constant is used).

You can view and participate in the further development of Pixockets at the repository address.

All Articles