Asynchronous PHP

Ten years ago, we had a classic LAMP stack: Linux, Apache, MySQL, and PHP, which worked in the slow mode of mod_php. The world has changed, and with it the importance of speed. PHP-FPM appeared, which allowed to significantly increase the performance of solutions in PHP, and not to urgently rewrite to something faster.

In parallel, the ReactPHP library was developed using the Event Loop concept for processing signals from the OS and presenting results for asynchronous operations. The development of the idea of ​​ReactPHP - AMPHP. This library uses the same Event Loop, but supports coroutines, unlike ReactPHP. They allow you to write asynchronous code that looks like synchronous. Perhaps this is the most current framework for developing asynchronous applications in PHP.



But speed is required more and more, tools are already not enough, so the idea of ​​asynchronous programming in PHP is one of the ways to speed up query processing and better utilize resources.

This is what Anton Shabovta will talk about (zloyusr) Is a developer at Onliner. Experience over 10 years: I started with desktop applications in C / C ++, and then switched to web development in PHP. He writes "Home" projects in C # and Python 3, and in PHP he is experimenting with DDD, CQRS, Event Sourcing, Async Multitasking.

This article is based on a transcript of Anton's report on PHP Russia 2019 . In it we will understand blocking and non-blocking operations in PHP, we will study the structure of Event Loop and asynchronous primitives, such as Promise and coroutines, from the inside. Finally, we will find out what awaits us in ext-async, AMPHP 3, and PHP 8.


We introduce a couple of definitions. For a long time I tried to find an exact definition of asynchrony and asynchronous operations, but I did not find and wrote mine.
Asynchrony is the ability of a software system to not block the main thread of execution.
An asynchronous operation is an operation that does not block the execution flow of a program until it completes.

It seems to be simple, but first you need to understand what operations block the execution flow.

Blocking operations


PHP is an interpreter language. He reads the code line by line, translates into his instructions and executes. On which line from the example below will the code be blocked?

public function update(User $user)
{
    try {
        $sql = 'UPDATE users SET ...';
        return $this->connection->execute($sql, $user->data());
    } catch (\PDOException $error) {
        log($error->getMessage());
    }

    return 0;
}

If we connect to the database via PDO, the thread of execution will be blocked on the query string to SQL-server: return $this->connection->execute($sql, $user->data());.

This is because PHP does not know how long the SQL server will process this query and whether it will execute at all. It waits for a response from the server and the program has not been running all this time.

PHP also blocks the flow of execution on all I / O operations.

  • File System : fwrite, file_get_contents.
  • Databases : PDOConnection, RedisClient. Almost all extensions for connecting a database work in blocking mode by default.
  • Processes : exec, system, proc_open. These are blocking operations, since all work with processes is built through system calls.
  • Working with stdin / stdout : readline, echo, print.

In addition, execution is blocked on timers : sleep, usleep. These are operations in which we explicitly tell the thread to fall asleep for a while. PHP will be idle all this time.

Asynchronous SQL Client


But modern PHP is a general-purpose language, and not just for the web like PHP / FI in 1997. Therefore, we can write an asynchronous SQL client from scratch. The task is not the most trivial, but solvable.

public function execAsync(string $query, array $params = [])
{
    $socket = stream_socket_client('127.0.0.1:3306', ...);

    stream_set_blocking($socket, false);

    $data = $this->packBinarySQL($query, $params);
    
    socket_write($socket, $data, strlen($data));
}

What does such a client do? It connects to our SQL server, puts the socket in non-blocking mode, packs the request in a binary format that the SQL server understands, writes data to the socket.

Since the socket is in non-blocking mode, the write operation from PHP is fast.

But what will return as a result of such an operation? We do not know what the SQL server will respond. It may take a long time to complete the request or not at all. But something needs to be returned? If we use PDO and call the updatequery on the SQL server, we are returned affected rows- the number of rows changed by this query. We cannot return it yet, therefore we only promise a return.

Promise


This is a concept from the world of asynchronous programming.
Promise is a wrapper object over the result of an asynchronous operation. Moreover, the result of the operation is still unknown to us.
Unfortunately, there is no single Promise standard, and it is not possible to directly transfer standards from the JavaScript world to PHP.

How Promise Works


Since there is no result yet, we can only establish some callbacks.



When data is available, it is necessary to execute a callback onResolve.



If an error occurs, a callback will be executed onRejectto handle the error.



The Promise interface looks something like this.

interface Promise
{
    const
        STATUS_PENDING = 0,
        STATUS_RESOLVED = 1,
        STATUS_REJECTED = 2
    ;

    public function onResolve(callable $callback);
    public function onReject(callable $callback);
    public function resolve($data);
    public function reject(\Throwable $error);
}

Promise has status and methods for setting callbacks and populating ( resolve) Promise with data or error ( reject). But there are differences and variations. Methods may be called differently, or instead of separate methods for establishing callbacks, resolveand there rejectmay be some one, as in AMPHP, for example.

Often techniques to fill Promise resolveand rejecttake out in a separate object the Deferred - storage state asynchronous function. It can be considered as a kind of factory for Promise. It is one-time: one Deferred makes one Promise.



How to apply this in the SQL client if we decide to write it ourselves?

Asynchronous SQL Client


First, we created Deferred, did all the work with sockets, wrote down the data and returned Promise - everything is simple.

public function execAsync(string $query, array $params = [])
{
    $deferred = new Deferred;

    $socket = stream_socket_client('127.0.0.1:3306', ...);
    stream_set_blocking($socket, false);

    $data = $this->packBinarySQL($query, $params);
    socket_write($socket, $data, strlen($data));

    return $deferred->promise();
}

When we have Promise, we can, for example:

  • set the callback and get the ones affected rowsthat returns to us PDOConnection;
  • handle the error, add to the log;
  • Retry the query if the SQL server responds with an error.

$promise = $this->execAsync($sql, $user->data());

$promise->onResolve(function (int $rows) {
    echo "Affected rows: {$rows}";
});

$promise->onReject(function (\Throwable $error) {
    log($error->getMessage());
});

The question remains: we set the callback, and who will call resolveand reject?

Event loop


There is the concept of Event Loop - an event loop . He is able to process messages in an asynchronous environment. For asynchronous I / O, these will be messages from the OS that the socket is ready to read or write.

How it works.

  • The client tells Event Loop that it is interested in some kind of socket.
  • Event Loop polls the OS through a system call stream_select: is the socket ready, is all the data written, is the data coming from the other side.
  • If the OS reports that the socket is not ready, blocked, then Event Loop repeats the loop.
  • When the OS notifies that the socket is ready, Event Loop returns control to the client and enables ( resolveor reject) Promise.



We express this concept in the code: take the simplest case, remove the error handling and other nuances, so that one infinite loop remains. In each iteration, it will poll the OS about sockets that are ready to read or write, and call a callback for a specific socket.

public static function run()
{
    while (true) {
        stream_select($readSockets, $writeSockets, null, 0);
        
        foreach ($readSockets as $i => $socket) {
            call_user_func(self::readCallbacks[$i], $socket);
        }

        // Do same for write sockets
    }
}

We complement our SQL client. We inform Event Loop that as soon as the data from the SQL server comes to the socket we are working with, we need to bring Deferred to the “done” state and transfer the data from the socket to Promise.

public function execAsync(string $query, array $params = [])
{
    $deferred = new Deferred;
    ...
    Loop::onReadable($socket, function ($socket) use ($deferred) {
        $deferred->resolve(socket_read($socket));
    });

    return $deferred->promise();
}

Event Loop can handle our I / O and works with sockets . What else can he do?

  • JavaScript setTimeout setInterval — . N . Event Loop .
  • Event Loop . process control, .

Event Loop


Writing your Event Loop is not only possible, but also necessary. If you want to work with asynchronous PHP, it is important to write your own simple implementation to understand how this works. But in production, we, of course, will not use it, but we will take ready-made implementations: stable, error-free and proven in work.

There are three main implementations.

ReactPHP . The oldest project, started back in PHP 5.3. Now the minimum required version of PHP is 5.3.8. The project implements the Promises / A standard from the JavaScript world.

AMPHP . It is this implementation that I prefer to use. The minimum requirement is PHP 7.0, and since the next version is already 7.3. It uses coroutines on top of Promise.

Swoole. This is an interesting Chinese framework in which developers try to port some concepts from the Go world to PHP. The documentation in English is incomplete, most of it on GitHub in Chinese. If you know the language, go ahead, but so far I’m scared to work.



ReactPHP


Let's see what the client will look like using ReactPHP for MySQL.

$connection = (new ConnectionFactory)->createLazyConnection();

$promise = $connection->query('UPDATE users SET ...');
$promise->then(
    function (QueryResult $command) {
        echo count($command->resultRows) . ' row(s) in set.';
    },
    function (Exception $error) {
        echo 'Error: ' . $error->getMessage();
    });

Everything is almost the same as we wrote: we create onnectionand execute the request. We can set the callback to process the results (return affected rows):

    function (QueryResult $command) {
        echo count($command->resultRows) . ' row(s) in set.';
    },

and callback for error handling:

    function (Exception $error) {
        echo 'Error: ' . $error->getMessage();
    });

From these callbacks you can build long-long chains, because each result thenin ReactPHP also returns Promise.

$promise
    ->then(function ($data) {
        return new Promise(...);
    })
    ->then(function ($data) {
        ...
    }, function ($error) {
        log($error);
    })
    ...

This is a solution to a problem called callback hell. Unfortunately, in the ReactPHP implementation, this leads to the “Promise hell” problem, when 10-11 callbacks are required to correctly connect RabbitMQ . Working with such code and fixing it is difficult. I quickly realized that this was not mine and switched to AMPHP.

Amphp


This project is younger than ReactPHP and promotes a different concept - coroutines . If you look at working with MySQL in AMPHP, you can see that this is almost the same as working with PDOConnectionin PHP.

$pool = Mysql\pool("host=127.0.0.1 port=3306 db=test");

try {
    $result = yield $pool->query("UPDATE users SET ...");

    echo $result->affectedRows . ' row(s) in set.';
} catch (\Throwable $error) {
    echo 'Error: ' . $error->getMessage();
}

Here we create a pool, connect and execute the request. We can handle errors through the usual ones try...catch, we do not need callbacks.

But before the asynchronous call, the keyword - appears here yield.

Generators


The keyword yieldturns our function into a generator.

function generator($counter = 1)
{
    yield $counter++;

    echo "A";

    yield $counter;

    echo "B";

    yield ++$counter;
}

As soon as the PHP interpreter encounters yieldfunctions in the body, it realizes that it is a generator function. Instead of executing, a class object is created when called Generator.

Generators inherit the iterator interface.

$generator = generator(1);

foreach ($generator as $value) {
    echo $value;
}

while ($generator->valid()) {
    echo $generator->current();

    $generator->next();
}

Accordingly, it is possible to run cycles foreachand whileand others. But, more interestingly, the iterator has methods currentand next. Let's go through them step by step.

Run our function generator($counter = 1). We call the generator method current(). The value of the variable will be returned $counter++.

As soon as we execute the generator next(), the code will go to the next call inside the generator yield. The whole piece of code between the two yieldwill execute, and that's cool. Continuing to spin the generator, we get the result.

Coroutines


But the generator has a more interesting function - we can send data to the generator from the outside. In this case, this is not quite a generator, but a coroutine or coroutine.

function printer() {  
    while (true) {     
        echo yield;       
    }                             
}                                

$print = printer();
$print->send('Hello');
$print->send(' PHPRussia');
$print->send(' 2019');
$print->send('!');

In this section of the code, it is interesting that it while (true)will not block the flow of execution, but will be executed once. We sent the data to Corutin and received 'Hello'. Sent more - received 'PHPRussia'. The principle is clear.

In addition to sending data to the generator, you can send errors and process them from the inside, which is convenient.

function printer() {
    try {
        echo yield;
    } catch (\Throwable $e) {
        echo $e->getMessage();
    }
}

printer()->throw(new \Exception('Ooops...'));

To summarize. Corutin is a component of a program that supports stopping and continuing execution while maintaining the current state . Corutin remembers his call stack, the data inside, and can use them in the future.

Generators and Promise


Let's look at the generator and Promise interfaces.

class Generator
{
    public function send($data);
    public function throw(\Throwable $error);
}

class Promise
{
    public function resolve($data);
    public function reject(\Throwable $error);
}

They look the same, except for different method names. We can send data and throw an error to both the generator and Promise.

How can this be used? Let's write a function.

function recoil(\Generator $generator)
{
    $promise = $generator->current();

    $promise->onResolve(function($data) use ($generator) {
        $generator->send($data);
        recoil($generator);
    };

    $promise->onReject(function ($error) use ($generator) {
        $generator->throw($error);
        recoil($generator);
    });
}

The function takes the current value of the generator: $promise = $generator->current();.

I exaggerated a little. Yes, we must check that the current value that is returned to us is some kind of instanceofPromise. If so, then we can ask him a callback. It internally sends the data back to the generator when Promise succeeds and recursively starts the function recoil.

    $promise->onResolve(function($data) use ($generator) {
        $generator->send($data);
        recoil($generator);
    };

The same can be done with errors. If Promise failed, for example, the SQL server said “Too many connections”, then we can throw the error inside the generator and go to the next step.

All this brings us to the important concept of cooperative multitasking.

Cooperative multitasking


This is a type of multitasking, in which the next task is performed only after the current task explicitly declares itself ready to give processor time to other tasks.

I rarely come across something simple, such as working with only one database. Most often, in the process of updating the user, you need to update the data in the database, in the search index, then clean or update the cache, and then send 15 more messages to RabbitMQ. In PHP, it all looks like this.



We perform operations one by one: we updated the database, index, and then the cache. But by default, PHP blocks on such operations (I / O), so if you look closely, in fact, everything is so.



On the dark parts we blocked. They take the most time.

If we work in asynchronous mode, then these parts are not there, the execution timeline is intermittent.



You can glue it all together and make pieces one by one.



What is all this for? If you look at the size of the timeline, at first it takes a lot of time, but as soon as we glue it together, the application accelerates.

The very concept of Event Loop and cooperative multitasking has long been used in various applications: Nginx, Node.js, Memcached, Redis. All of them use inside Event Loop and are built on the same principle.

Since we started talking about the Nginx and Node.js web servers, let's recall how the processing of requests in PHP takes place.

Request Processing in PHP


The browser sends a request, it gets to the HTTP server behind which there is a pool of FPM streams. One of the threads takes this request into operation, connects our code and starts to execute it.



When the next request arrives, another FPM thread will pick it up, connect the code and it will be executed.

There are advantages to this work scheme .

  • Simple error handling . If something went wrong and one of the requests fell, we do not need to do anything - the next one will come, and this will not affect its work.
  • We do not think about memory . We do not need to clean or monitor the memory. On the next request, all memory will be cleared.

This is a cool scheme that worked in PHP from the very beginning and still works successfully. But there are also disadvantages .

  • Limit the number of processes . If we have 50 FPM threads on the server, then as soon as the 51st request arrives, it will wait until one of the threads becomes free.
  • Costs for Context Switch . The OS switches requests between FPM streams. This processor-level operation is called Context Switch. It is expensive and runs a huge number of measures. It is necessary to save all the registers, the call stack, everything that is in the processor, then switch to another process, load its registers and its call stack, perform something there again, switch again, save again ... For a long time.

Let's approach the question differently - we will write an HTTP server in PHP itself.

Asynchronous HTTP Server




It can be done. We have already learned how to work with sockets in non-blocking mode, and an HTTP connection is the same socket. How will it look and work?

This is an example of starting HTTP servers in the AMPHP framework.

Loop::run(function () {
    $app = new Application();
    $app->bootstrap();

    $sockets = [Socket\listen('0.0.0.0:80')];

    $server = new Server($sockets, new CallableRequestHandler(
        function (Request $request) use ($app) {
            $response = yield $app->dispatch($request);

            return new Response(Status::OK, [], $response);
        })
    );

    yield $server->start();
});

Everything is quite simple: load Applicationand create a socket pool (one or more).

Next, we start our server, set it Handler, which will be executed on each request and send the request to ours Applicationin order to get a response.

The last thing to do is start the server yield $server->start();.

In ReactPHP it will look approximately the same, but only there will be 150 callbacks for different options, which is not very convenient.

Problems


There are several issues with asynchrony in PHP.

Lack of standards . Each framework: Swoole, ReactPHP, or AMPHP, implements its own Promise interface, and they are incompatible.

AMPHP could theoretically interact with Promise from ReactPHP, but there is a caveat. If the code for ReactPHP is not very well written, and somewhere implicitly calls or creates an Event Loop, then it turns out that two Event Loops will spin inside.

JavaScript has a relatively good Promises / A + standard that implements Guzzle. It would be nice if the frameworks follow it. But so far this is not.

Memory leaks. When we work in PHP in the usual FPM mode, we may not think about memory. Even if the developers of some extension forgot to write good code, forgot to run through Valgrind, and somewhere inside the memory is flowing, then there is nothing to worry about - the next request will be cleared and start over. But in asynchronous mode, you cannot afford this, because sooner or later we will simply fall off OutOfMemoryException.

It is possible to repair, but it is difficult and painful. In some cases, Xdebug helps, in others, strace to parse the errors that caused OutOfMemoryException.

Blocking operations . It is vital not to block Event Loop when we write asynchronous code. The application slows down as soon as we block the flow of execution, each of our coroutines starts to run slower.

The kelunik / loop-block package will help find such operations for AMPHP . He sets the timer to a very small interval. If the timer does not work, then we are blocked somewhere. The package helps in finding blocking places, but not always: blocking in some extensions may not be noticed.

Library Support: Cassandra, Influx, ClickHouse . The main problem of all asynchronous PHP is the support of libraries. We cannot use the usual PDOConnection, RedisClientother drivers for everyone - we need non-blocking implementations. They must also be written in PHP in non-blocking mode, because C drivers rarely provide interfaces that can be integrated into asynchronous code.

The strangest experience I got with the driver for the Cassandra database. They provide operationsExecuteAsync, GetAsyncand others, but at the same time they return an object Futurewith a single method getthat blocks. There is an opportunity to get something asynchronously, but to wait for the result, we will still block our entire Loop. To do it somehow differently, for example, through callbacks, it does not work. I even wrote my client for Cassandra, because we use it in our work.

Type indication . This is a problem of AMPHP and corutin.

class UserRepository
{
    public function find(int $id): \Generator
    {
        $data = yield $this->db->query('SELECT ...', $id);

        return User::fill($data);
    }
}

If it occurs in a function yield, then it becomes a generator. At this point, we can no longer specify the correct return data types.

PHP 8


What awaits us in PHP 8? I’ll tell you about my assumptions or, rather, my desires ( editor's note: Dmitry Stogov knows what will actually appear in PHP 8 ).

Event Loop There is a chance that it will appear, because work is underway to bring Event Loop in some form to the kernel. If this happens, we will have a function await, like in JavaScript or C #, which will allow us to wait for the result of the asynchronous operation in a certain place. In this case, we will not need any extensions, everything will work asynchronously at the kernel level.


class UserRepository
{
    public function find(int $id): Promise<User>
    {
        $data = await $this->db->query('SELECT ...', $id);

        return User::fill($data);
    }
}


Generics Go is waiting for Generics, we are waiting for Generics, everyone is waiting for Generics.

class UserRepository
{
    public function find(int $id): Promise<User>
    {
        $data = yield $this->db->query('SELECT ...', $id);

        return User::fill($data);
    }
}

But we are not waiting for Generics for collections, but to indicate that the result of Promise will be exactly the User object.

Why all this?

For speed and performance.
PHP is a language in which most of the operations are I / O bound. We rarely write code that is significantly tied to computations in the processor. Most likely, we work with sockets: we need to make a request to the database, read something, return a response, send a file. Asynchrony allows you to speed up such code. If we look at the average response time for 1000 requests, we can accelerate by about 8 times, and by 10,000 requests by almost 6!

May 13, 2020, we will gather for the second time in PHP Russia to discuss the language, libraries and frameworks, ways to increase productivity and the pitfalls of hype solutions. We have accepted the first 4 reports , but Call for Papers is still coming. Apply if you want to share your experience with the community.

Source: https://habr.com/ru/post/undefined/


All Articles