Stas Afanasyev. Juno. Pipelines based on io.Reader / io.Writer. Part 1

In the report, we will talk about the concept of io.Reader / io.Writer, why they are needed, how to implement them correctly and what pitfalls exist in this regard, as well as about building pipelines based on standard and custom io.Reader / io.Writer implementations .



Stanislav Afanasyev (hereinafter - SA): - Good afternoon! My name is Stas. I came from Minsk, from the Juno company. Thank you for coming on this rainy day, having found the strength to leave the house.

Today I want to talk with you on such a topic as building pipelines Go based on io.Reader / io.Writer interfaces. What I’ll talk about today is, in general, the concept of io.Reader / io.Writer interfaces, why they are needed, how to use them correctly and, most important, how to implement them correctly.

We will also talk about building pipelines based on various implementations of these interfaces. We will talk about existing methods, discuss their pros and cons. I will mention various pitfalls (this will be in abundance).

Before we begin, we must answer the question, why are these interfaces needed at all? Raise your hands, who works with Go tightly (every day, every other day) ...



Great! We still have a Go community. I think many of you have worked with these interfaces, have heard of them, at least. You may not even know about them, but you certainly should have heard something about them.

First of all, these interfaces are an abstraction of the input-output operation in all its manifestations. Secondly, it is a very convenient API that allows you to build pipelines, like a constructor from cubes, without really thinking about the internal details of the implementation. At least that was originally intended.

io.Reader


This is a very simple interface. It consists of only one method - the Read method. Conceptually, the io.Reader interface implementation can be a network connection - for example, where there is no data yet, but they can appear there:



It can be a buffer in memory where the data already exists and can be read out entirely. It can also be a file descriptor - we can read this file in pieces if it is very large.

The conceptual implementation of the io.Reader interface is access to some data. All cases that I wrote are supported by the Read method. It has only one argument - this is slice byte.
One point to make here. Those who came to Go recently or came from some other technology, where there was no similar API (I am one of those), this signature is a bit confusing. The Read method seems to somehow read this slice. In fact, the opposite is true: the Reader interface implementation reads the data inside and fills this slice with the data that this implementation has.

The maximum amount of data that can be read on request by the Read method is equal to the length of this slice. A regular implementation returns as much data as it can return at the time of the request, or the maximum amount that fits into this slice. This suggests that Reader can be read in pieces: at least by byte, at least ten - as you like. And the client that calls Reader, according to the return values ​​from the Read method, thinks how to live on.

The Read method returns two values:

  • number of bytes subtracted;
  • an error if it occurred.

These values ​​influence the further behavior of the client. There is a gif on the slide that shows, displays this process, which I just described:





Io.Reader - How to?


There are exactly two ways for your data to satisfy the Reader interface.



The first is the simplest. If you have some kind of slice byte, and you want to make it satisfy the Reader interface, you can take the implementation of some standard library that already satisfies this interface. For example, Reader from the bytes package. On the slide above you can see the signature of how this Reader is created.

There is a more complicated way - to implement the Reader interface yourself. There are approximately 30 lines in the documentation with tricky rules, restrictions that must be followed. Before we talk about all of them, it became interesting to me: “And in what cases are not enough standard implementations (standard library)? When is the moment when we need to implement the Reader interface ourselves? ”

In order to answer this question, I took the thousand of the most popular repositories on Github (by the number of stars), added them and found all implementations of the Reader interface there. On the slide, I have some statistics (categorized) of when people implement this interface.

  • The most popular category is connections. This is an implementation of both proprietary protocols and wrappers for existing types. So, Brad Fitzpatrick has a Camlistore project - there is an example in the form of statTrackingConn, which, in general, is an ordinary Wrapper over the con type from the net package (adds metrics to this type).
  • The second most popular category is custom buffers. Here I liked the one and only example: dataBuffer from the x / net package. Its peculiarity is that it stores data cut into chunks, and when subtracting it passes through these chunks. If the data in the chunk is over, it moves on to the next chunk. At the same time, he takes into account the length, the place that he can fill in the transmitted slice.
  • Another category is all kinds of progress-bars, counting the number of bytes subtracted with sending in metrics ...

Based on this data, we can say that the need to implement the io.Reader interface occurs quite often. Let's then start talking about the rules that are in the documentation.

Documentation Rules


As I said, the list of rules, and in general the documentation is quite large, massive. 30 lines is enough for an interface that consists of only three lines.

The first, most important rule concerns the number of bytes returned. It must be strictly greater than or equal to zero and less than or equal to the length of the sent slice. Why is it important?



Since this is a fairly strict contract, the client can trust the amount that comes from the implementation. There are Wrappers in the standard library (for example, bytes.Buffer and bufio). There is such a moment in the standard library: some implementations trust wrapped Readers, some do not trust (we will talk about this later).

Bufio doesn't trust anything at all - it checks absolutely everything. Bytes.Buffer trusts absolutely everything that arrives to him. Now I will demonstrate what is happening in connection with this ...

We will now consider three possible cases - these are three implemented Readers. They are quite synthetic, useful for understanding. We will read all these Readers using the ReadAll helper. His signature is presented at the top of the slide:



io.Reader # 1. Example 1


ReadAll is a helper that takes some kind of implementation of the Reader interface, reads all of it and returns the data that it read, as well as an error.

Our first example is Reader, which will always return -1 and nil as an error, i.e. such a NegativeReader. Let's run it and see what happens:



As you know, panic for no reason is a sign of foolishness. But who in this case is fool - me or byte.Buffer - depends on the point of view. Those who write this package and who follow it have different points of view.

What happened here? Bytes.Buffer accepted a negative number of bytes, did not check that it was negative, and tried to cut off the internal buffer along the upper boundary, which he received - and we got out of the slice bounds.

There are two problems in this example. The first is that the signature is not forbidden to return negative numbers, and the documentation is forbidden. If the signature had Uint, then we would get a classic overflow (when a signed number is interpreted as unsigned). And this is a very tricky bug, which will certainly happen on Friday night, when you are already home assembled. Therefore, panic in this case is the preferred option.

The second “point” is that the stack trace does not understand what happened at all. It is clear that we have gone beyond the boundaries of the slice - so what? When you have such a multilayer pipe and such an error occurs, it is not immediately clear what happened. So the bufio of the standard library also “panics” in this situation, but it does it more beautifully. He immediately writes: “I subtracted a negative number of bytes. I won’t do anything else - I don’t know what to do with it. ”

And bytes.Buffer is panicking as best he can. I posted an issue to Golang asking me to add a human error. Day three, we discussed the prospects of this decision. The reason is this: historically it happened that different people at different times made different uncoordinated decisions. And now we have the following: in one case we don’t trust the implementation at all (we check everything), and in the other we trust completely, we don’t get what it comes from there. This is an unresolved issue, and we will talk more about this.

io.Reader # 1. Example 2


The following situation: our Reader will always return 0 and nil as results. From the point of view of contracts, everything is legal here - there are no problems. The only caveat: the documentation says that implementations are not recommended to return the values ​​0 and nil, in addition to the case, when the length of the sent slice is zero.

In real life, such a Reader can cause a lot of trouble. So, we return to the question, should we trust Reader? For example, a check is built into bufio: it sequentially reads Reader exactly 100 times - if such a pair of values ​​is returned 100 times, it simply returns NoProgress.

There is nothing like this in bytes.Buffer. If we run this example, we get just an endless loop (ReadAll uses bytes.Buffer under the hood, not Reader itself):



io.Reader # 1. Example 2


One more example. It is also quite synthetic, but useful for understanding:



Here we always return 1 and nil. It would seem that there are no problems here either - everything is legal from the point of view of the contract. There is a nuance: if I run this example on my computer, then it will freeze after 30 seconds ...

This is due to the fact that the client that reads this Reader (that is, bytes.Buffer) never gets a sign of the end of the data - it reads, subtracts ... Plus, he gets one subtracted byte every time. For him, this means that at some point, the repositioned buffer ends, it still runs - the situation repeats, and it runs to infinity until it bursts.

io.Reader # 2. Error return


We come to the second important rule for implementing the Reader interface - this is an error return. The documentation states three errors that the implementation should return. The most important of them is EOF.

EOF is the very sign of the end of the data, which the implementation should return whenever it runs out of data. Conceptually, this is, in general, not a mistake, but made as a mistake.

There is another mistake called UnexpectedEOF. If suddenly while reading Reader can no longer read the data, it was thought that it would return UnexpectedEOF. But in fact, this error is used only in one place of the standard library - in the ReadAtLeast function.



Another mistake is NoProgress, which we already talked about. The documentation says so: this is a sign that the interface is implemented sucks.

Io.Reader # 3


The documentation stipulates a set of cases on how to correctly return the error. Below you can see three possible cases:



We can return an error both with the number of bytes subtracted, and separately. But if all of a sudden your data runs out in your Reader, and you cannot return the EOF [end sign] right now (many implementations of the standard library work just like that), then it is assumed that you will return EOF to the next consecutive call (that is, you must let go customer).

For the client, this means that there is no more data - do not come to me anymore. If you return nil, and the client needs data, then he should come to you again.

io.Reader. Mistakes


In general, according to Reader, these were the main important rules. There is still a set of small ones, but they are not so important and do not lead to such a situation:



Before we go through everything related to Reader, we need to answer the question: is it important, do errors often happen in custom implementations? To answer this question, I turned to my spool for 1000 repositories (and there we got about 550 custom implementations). I looked through the first hundred with my eyes. Of course, this is not super-analysis, but what it is ... I

identified the two most popular errors:
  • never returns EOF;
  • too much trust in the wrapped Reader.

Again, this is a problem from my point of view. And from those who are watching the io package, this is not a problem. We’ll talk about this again.

I would like to return to one nuance. See:



The client should never interpret the pair 0 and nil as EOF. This is mistake! For Reader, this value is just an opportunity to let go of the client. So the two errors that I said about seem insignificant, but it’s enough to imagine that you have a multi-layer pipeline in the prod and a small, sly “bagul” has crept in the middle, then the “underground knock” will not take long - guaranteed!

According to Reader, basically everything. These were the basic implementation rules.

io.Writer


At the other end of the pipelines, we have io.Writer, which is where we usually write data. A very similar interface: it also consists of one method (Write), their signature is similar. From the point of view of semantics, the Writer interface is more understandable: I would say that as it is heard, it is written.



The Write method takes a slice byte and writes it in its entirety. He also has a set of rules that must be followed.

  1. The first of these concerns the returned number of bytes written. I would say that it is not so strict, because I did not find a single example when it would lead to some critical consequences - for example, to panic'am. This is not very strict because there is the following rule ...
  2. The Writer implementation is required to return an error whenever the amount of data written is less than what was sent. That is, partial recording is not supported. This means that it is not very important how many bytes were written.
  3. One more rule: Writer should by no means modify the sent slice, because the client will still work with this slice.
  4. Writer should not hold this slice (Reader has the same rule). If you need data in your implementation for some operations, you just need to copy this slide, and that’s it.



By Reader and Writer, that's it.

Dendrogram


Especially for this report, I generated a graph of implementation and designed it in the form of a dendrogram. Those who want right now can follow this QR code:



This dendrogram has all implementations of all interfaces of the io package. This dendrogram is needed to simply understand: what and with what you can stick together in the pipelines, where and what you can read, where you can write. I will still refer to it in the course of my report, so please refer to the QR code.

Pipelines


We talked about what Reader, io.Writer is. Now let's talk about the API that exists in the standard library for building pipelines. Let's start with the basics. Maybe it will not even be interesting to anyone. However, this is very important.

We will read the data from the standard input stream (from Stdin):



Stdin is represented in Go by a global variable of type file from the os package. If you take a look at the dendrogram, you will notice that the file type implements the Reader and Writer interfaces as well.

Right now we are interested in Reader. We will be reading out Stdin using the same ReadAll helper that we already used.

One nuance regarding this helper is worth noting: ReadAll reads Reader to the end, but it determines the ending by EOF, according to the sign of the end that we talked about.
We will now limit the amount of data that we read from Stdin. To do this, there is an implementation of LimitedReader in the standard library:



I would like you to pay attention to how LimitedReader limits the number of bytes to be read. One would think that this implementation, this Wrapper, subtracts everything that is in the Reader, which it wraps, and then gives as much as we want. But everything works a little differently ...

LimitedReader trims the slice given to it as an argument along the upper boundary. And he passes this cropped slice to Reader, which wraps it. This is a clear demonstration of how the length of the read data is regulated in the io.Reader interface implementations.

Error return end of file


Another interesting point: note how this implementation returns an EOF error! The returned named values ​​are used here, and they are assigned by the values ​​that we get from the wrapped Reader.

And if it happens that there is more data in the wrapped Reader than we need, we assign the values ​​of the wrapped Reader - for example, 10 bytes and nil - because there is still data in the wrapped Reader. But the variable n, which decreases (in the penultimate line), says that we have reached the “bottom” - the end of what we need.

In the next iteration, the client should come again - on the first condition, he will receive EOF. This is the case that I mentioned.

To be continued very soon ...


A bit of advertising :)


Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends cloud-based VPS for developers from $ 4.99 , a unique analog of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only we have 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands!Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - from $ 99! Read about How to Build Infrastructure Bldg. class c using Dell R730xd E5-2650 v4 servers costing 9,000 euros for a penny?

All Articles