📂 👨🏼‍🎨 ✋🏿 Why did asynchronous web servers appear? 🥨 🛒 👗

Hello everyone. In touch Vladislav Rodin. I am currently the head of the High Load Architect course at OTUS, and I also teach courses on software architecture.

In addition to teaching, as you can see, I have been writing for a blog copyright material OTUS Habré and today's article I want to coincide with the start of the course «Linux Administrator" , which now is open the set.

Introduction

Why does the web application slow down and not hold the load? The developers, who were the first to encounter such a question and conducted research on some systems, came to the disappointing conclusion that optimizing a business logic alone would not be enough. The answer to this question lies at a lower level - at the level of the operating system. In order for your application to hold the load, it is necessary to revise its architectural concept in such a way that it works effectively at this level. This led to the emergence of asynchronous web servers.

Unfortunately, I could not find a single material that allows me to restore all causal relationships in the evolution of web servers at once. So the idea of writing this article came up, which, I hope, will become such material.

Linux OS Features

Before talking about the models of web servers, I allow myself to recall some features of the processes and threads in Linux. We will need this when analyzing the advantages and disadvantages of the above models.

Context switch

Most likely, any user who knows that only one program can be executed on one processor core at once will ask: “Why can 20 programs be launched on my 4-core processor at a time?”.

In fact, this is due to the fact that preemptive multitasking takes place . The operating system allocates a certain quantum of time (~ 50 μs) and puts the program to execute on the kernel during this time. After the time runs out, the context switch is interrupted and switched. That is, the operating system simply puts the next program to execute. Since switching occurs frequently, we have the impression that all programs work simultaneously. Pay attention to the high switching frequency, this will be important for the subsequent presentation.

The context switch was mentioned above. What does it include? When switching the context, it is necessary to save the processor registers, to clear its instruction pipeline, to save the memory regions allocated to the process. In general, the operation is quite expensive. It takes ~ 0.5 μs, while the execution of a simple line of code is ~ 1 ns. Moreover, with an increase in the number of processes per processor core, overhead for context switching will increase.

Web Server Models

Currently, the following web server models exist:

worker
prefork
asynchronous
combined

Let's discuss each of them separately.

Worker and prefork

Historically, with these models, it all started. The essence is very simple : a client comes to us, we select a separate handler for him, which processes the incoming client from the beginning to the end. A handler can be either a process (prefork) or a thread (worker). An example of such a web server is the well-known Apache.

I’ll make a reservation right away: creating a new handler for each client is expensive. First, with a constant number of cores, an increase in the number of processors leads to an increase in latency (due to context switches). Secondly, the required amount of memory grows linearly with the increase in clients, because even if you use memory-sharing threads, each thread has its own stack. Thus, the number of clients processed simultaneously is limited.the size of the pool, which, in turn, depends on the number of processor cores. The problem is solved using vertical scaling methods.

Another fundamental drawback of such servers is the non-optimal use of resources. Processes (or threads) are idle for most of the time . Imagine the following situation: during client processing, it is necessary to take some data from the hard drive, make a request to the database, or write something to the network. Since reading from a hard disk in Linux is a blocking operation , the process (or thread) will hang waiting for a response, but it will still participate in the allocation of processor time.

Worker vs prefork

Worker and prefork have quite a few fundamental differences. Streams are somewhat more economical in memory because they share it. For the same reason, a context switch between them is easier than between processes. However, in the case of a worker, the code becomes multi-threaded, because the threads must be synchronized. As a result, we get all the “charms” of multi-threaded code: it becomes more difficult to write, read, test and debug it.

Asynchronous model

So, worker and prefork do not allow processing a large number of clients at the same time due to limited pool size, and also do not optimally use resources due to context switching and blocking system calls. As you can see, the problem is multithreading and a heavy OS scheduler. This leads to the following idea: let's process clients in only one thread, but let it be loaded at 100%.

Such servers are based on an event loop and a reactor template ( event machine ). Client code, initiating an I / O operation, registers a callbackin a priority queue (priority is readiness time). The event loop polls the descriptors waiting for I / O, and then updates the priority (if available). In addition, the event loop pulls events out of the priority queue, causing callbacks to return control to the event loop at the end.

This model allows you to handle a large number of clients, avoiding overhead'a to switch context . This model is not perfect and has several disadvantages. Firstly, no more than one processor core is consumed , because there is only one process. This is treated using a combined model, which will be discussed below. Secondly, customers are connected by this process.. The code needs to be written carefully. Memory leaks, errors lead to the fact that all clients fall off at once. In addition, this process should not be blocked by anything, callback should not consist in solving some difficult tasks, because all clients will be blocked. Thirdly, asynchronous code is much more complicated . It is necessary to register an additional callback that the data will not come, to solve the problem of how to correctly branch, etc.

Combined model

This model is used in real servers. This model has a pool of processes, each of which has a pool of threads, each of which, in turn, uses a processing model based on asynchronous input-output. Nginx presents a combined model.

Conclusion

Thus, turning to the basics of the operating system, we examined the conceptual differences between the web server models used in Apache and Nginx. Each of them has its own advantages and disadvantages, so their combination is often used in production.

The idea of asynchronous processing has evolved further: at the level of language platforms, the concepts of green threads / fibers / goroutines arose, which allow you to "hide under the hood" the asynchrony of input and output, leaving the developer happy with a beautiful synchronous code. However, this concept deserves a separate article.

Learn more about the course.

Why did asynchronous web servers appear?