More about Coroutines in C ++

Hello colleagues.

As part of the development of the C ++ 20 theme, we at one time came across a rather old (September 2018) article from the Yandex’s hublog, which is called “ Getting Ready for C ++ 20. Coroutines TS with a Real Example ”. It ends with the following very expressive vote:



“Why not,” we decided and translated an article by David Pilarski under the title “Coroutines introduction”. The article was published a little over a year ago, but hopefully you will find it very interesting anyway.

So it happened. After much doubt, controversy, and preparation of this feature, WG21 came to a common opinion about what coroutines should look like in C ++ - and it is very likely that they will be included in C ++ 20. Since this is a major feature, I think it's time to prepare and study it already now (as you remember, there are still more modules, concepts, ranges to learn ...)

Many still oppose coroutine. Often they complain about the complexity of their development, a lot of points of customization and, possibly, suboptimal performance due to, possibly, under-optimized allocation of dynamic memory (maybe;)).

In parallel with the development of approved (officially published) technical specifications (TS), even attempts have been made to parallel development of another mechanism of corutin. Here we will talk about those coroutines that are described in TS ( technical specification ). An alternative approach, in turn, belongs to Google. As a result, it turned out that the Google approach suffers from numerous problems, the solution of which often requires strange additional features of C ++.

In the end, it was decided to adopt a version of Corutin developed by Microsoft (sponsored by TS). It is about such coroutines that will be discussed in this article. So, let's start with the question of ...

What are coroutines?


Coroutines already exist in many programming languages, for example, in Python or C #. Coroutines are another way to create asynchronous code. How they differ from flows, why coroutines should be implemented as a dedicated language feature and, finally, what is their use will be explained in this section.

There is a serious misunderstanding regarding what coroutines are. Depending on the environment in which they are used, they may be called:

  • Stackless Coroutines
  • Stack coroutines
  • Green streams
  • Fibers
  • Gorutins

The good news: stack corutins, green streams, fibers and gorutins are one and the same thing (but they are sometimes used in different ways). We will talk about them later in this article and we will call them fibers or stack coroutines. But the stackless coroutine has some features that need to be discussed separately.

To understand coroutines, including on an intuitive level, let's briefly get to know the functions and (let us put it this way) “their API”. The standard way to work with them is to call and wait until it finishes:

void foo(){
     return; //     
}	
foo(); //   / 

After calling the function, it is already impossible to pause, or resume its work. You can perform only two operations on functions: startand finish. When the function is launched, you must wait until it is completed. If the function is called again, its execution will go from the very beginning.

With coroutines, the situation is different. You can not only start and stop them, but also pause and resume them. They are still different from core flows, because the coroutines themselves are not crowding out (on the other hand, coroutines are usually flowing, and the flow is crowding out). To understand this, consider a generator defined in Python. Let such a thing be called a generator in Python, in C ++ it would be called coroutine. An example is taken from this site :

def generate_nums():
     num = 0
     while True:
          yield num
          num = num + 1	

nums = generate_nums()
	
for x in nums:
     print(x)
	
     if x > 9:
          break

Here's how this code works: a function call generate_numsleads to the creation of a coroutine object. At each step of enumerating a coroutine object, coroutine itself resumes work and pauses it only after a keyword yieldin the code; then the next integer from the sequence is returned (the for loop is syntactic sugar for calling a function next()that resumes coroutine). The code ends the loop by encountering a break statement. In this case, corutin never ends, but it is easy to imagine a situation in which corutin reaches the end and ends. As we can see, to a korutine applicable operations start, suspend, resumeand finally,finish. [Note: C ++ also provides creation and destruction operations, but they are not important in the context of an intuitive understanding of coroutine].

Coroutines as a library


So, now it’s approximately clear what coroutines are. You may know that there are libraries for creating fiber objects. The question is, why do we need coroutines in the form of a dedicated language feature, and not just a library that would work with coroutines.

Here we are trying to answer this question and demonstrate the difference between stacked and stackless coroutines. This difference is key to understanding corutin as part of the language.

Stack coroutines


So, let's first discuss what stack coroutines are, how they work, and why they can be implemented as a library. Explaining them is relatively simple, because they resemble streams in terms of design.

Fiber or stack corutin has a separate stack that can be used to handle function calls. To understand exactly how coroutines of this kind work, we briefly look at function frames and function calls from a low-level point of view. But first, let's talk about the properties of fibers.

  • They have their own stack,
  • The lifetime of the fibers does not depend on the code that calls them (usually they have a user-defined scheduler),
  • Fibers can be detached from one thread and attached to another,
  • Cooperative planning (the fiber must decide to switch to another fiber / scheduler),
  • Cannot work simultaneously in the same thread.

The following effects result from the above properties:

  • switching the context of the fibers should be carried out by the user of the fibers, and not the OS (in addition, the OS can release the fiber, releasing the thread in which it works),
  • There is no real data race between the two fibers, since at any given time only one of them can be active,
  • The fiber designer must be able to choose the right place and time, where and when it is appropriate to return computing power to a possible scheduler or caller.
  • The input / output operations in the fiber must be asynchronous, so that other fibers can perform their tasks without blocking each other.

Now let's take a closer look at the operation of the fibers and first explain how the stack participates in function calls.

So, the stack is a continuous block of memory needed to store local variables and function arguments. But, more importantly, after each function call (with a few exceptions), additional information is pushed onto the stack that tells the called function how to return to the caller and restore processor registers.

Some of these registers have special assignments, and when calling functions, they are stored on the stack. These are the registers (in the case of the ARM architecture):

SP - stack pointer
LR - communication register
PC - program counter

stack pointer(SP) is a register that contains the address of the beginning of the stack related to the current function call. Thanks to the existing value, you can easily refer to arguments and local variables stored on the stack.

The communication register (LR) is very important when calling functions. It stores the return address (the address of the calling party), where the code will be executed after the execution of the current function is completed. When the function is called, the PC is saved in LR. When the function returns, the PC is restored using LR.

Program Counter (PC) is the address of the currently executing instruction.
Whenever a function is called, the list of links is saved, so that the function knows where the program should return after it finishes.



The behavior of the PC and LR registers when calling and returning a function

When executing a stack coroutine, the called functions use the previously allocated stack to store its arguments and local variables. Since all information on each function called on the stack corutin is stored on the stack, the fiber can suspend any function within that corutin.



Let's see what happens in this picture. Firstly, each fiber and thread has its own separate stack. Green color indicates serial numbers indicating the sequence of actions.

  1. A regular function call inside a thread. Memory is allocated on the stack.
  2. . . , . . , .
  3. .
  4. . .
  5. .
  6. .
  7. . , , , .
  8. .
  9. .
  10. – , .
  11. , .
  12. . .
  13. . : , . , ( ) .
  14. , .
  15. .
  16. . . . , .
  17. .
  18. , , .

When working with stack coroutines, there is no need for a dedicated language feature that would ensure their use. Entire stack korutiny well be implemented using libraries, and libraries already exist that are designed specifically for this purpose:

swtch.com/libtask
code.google.com/archive/p/libconcurrency
www.boost.org Boost.Fiber
www.boost.org the Boost .Coroutine

Of all these libraries, only Boost is C ++, and all the rest are C.
For a detailed description of how these libraries work, see the documentation. But, in general, all these libraries allow you to create a separate stack for fiber and provide the opportunity to resume coroutine (at the initiative of the caller) and pause it (from the inside).

Consider an example Boost.Fiber:

#include <cstdlib>
#include <iostream>
#include <memory>
#include <string>
#include <thread>
	
#include <boost/intrusive_ptr.hpp>
	
#include <boost/fiber/all.hpp>
	
inline
void fn( std::string const& str, int n) {
     for ( int i = 0; i < n; ++i) {
          std::cout << i << ": " << str << std::endl;
               boost::this_fiber::yield();
     }
}
	
int main() {
     try {
          boost::fibers::fiber f1( fn, "abc", 5);
          std::cerr << "f1 : " << f1.get_id() << std::endl;
          f1.join();
          std::cout << "done." << std::endl;
	
          return EXIT_SUCCESS;
     } catch ( std::exception const& e) {
          std::cerr << "exception: " << e.what() << std::endl;
     } catch (...) {
          std::cerr << "unhandled exception" << std::endl;
     }
     return EXIT_FAILURE;
}

In the case of Boost.Fiber , the library has a built-in scheduler for coroutine. All fibers run in the same thread. Since corutin planning is cooperative, the fiber must first decide when to return control to the scheduler. In this example, this happens when the yield function is called, which pauses the coroutine.

Since there is no other fiber, the fiber planner always decides to resume coroutine.

Stackless Coroutines


Stackless coroutines differ slightly in properties from stack ones. However, they have the same basic characteristics, since the non-stacked coroutines can also be started, and after their suspension can be resumed. Coroutines of this type we are likely to find in C ++ 20.

If we talk about the similar properties of corutin - coroutines can:

  • Corutin is closely connected with her caller: when a coroutine is called, execution is transferred to her, and the result of the coroutine is transferred back to the caller.
  • The life span of a stack corutin is equal to the life of its stack. The lifespan of a stackless coroutine is equal to the life of its object.

However, in the case of stackless coroutines, there is no need to allocate a whole stack. They consume much less memory than stack ones, but this is precisely due to some of their limitations.

To begin with, if they do not allocate memory for the stack, then how do they work? Where in their case all the data goes that should be stored on the stack when working with stack coroutines. Answer: on the stack of the caller.

The secret to stackless coroutines is that they can only suspend themselves from the topmost function. For all other functions, their data is located on the stack of the called side, so all functions called from corutin must be completed before the corutin's work is suspended. All data needed by coroutine to maintain its state is dynamically allocated on the heap. This usually requires a couple of local variables and arguments, which are much more compact than a whole stack that would have to be allocated in advance.

Take a look at how stackless corutins work:



Challenging a stackless corutin

As you can see, now there is only one stack - this is the main stack of the thread. Let's take a step-by-step look at what is shown in this picture (the coroutine activation frame here is two-color - black shows what is stored on the stack, and blue - what is stored on the heap).

  1. A regular function call whose frame is stored on the stack
  2. The function creates a coroutine . That is, it allocates an activation frame for it somewhere on the heap.
  3. Normal function call.
  4. Call Corutin . Corutin's body stands out in a regular stack. The program is executed in the same way as in the case of a regular function.
  5. A regular function call from coroutine. Again, everything still happens on the stack [Note: you cannot pause the coroutine from this point, since this is not the topmost function in the coroutine]
  6. [: .]
  7. – , , .
  8. – , + .
  9. 5.
  10. 6.
  11. . .

So, it is obvious that in the second case it is necessary to remember much less data for all operations of suspending and resuming the work of coroutine, however, coroutine can resume and suspend only itself, and only from the topmost function. All function calls and coroutine occur the same way, however, between calls it is required to save some additional data, and the function must be able to jump to the suspension point and restore the state of local variables. There are no other differences between the coroutine frame and the function frame.

Corutin can also cause other coroutines (not shown in this example). In the case of stackless coroutines, each call results in the allocation of a new space for new coroutine data (with a repeated call of coroutine, dynamic memory can also be allocated several times).

The reason why coroutines need to provide a dedicated language feature is because the compiler needs to decide which variables describe the state of coroutine and create stereotyped code to jump to the suspension points.

Practical use of corutin


Coroutines in C ++ can be used in the same ways as in other languages. Coroutines will simplify the spelling:

  • generators
  • asynchronous input / output code
  • lazy computing
  • event driven applications

Summary


I hope that by reading this article you will find out:

  • why in C ++ you need to implement coroutines as a dedicated language feature
  • What is the difference between stacked and stackless coroutines?
  • why coroutines are needed

All Articles