🤚🏼 🤡 🚣🏾 HighLoad ++, Mikhail Raichenko (ManyChat): almost without magic, or how easy it is to distribute terabit video stream 🚆 🧕🏾 🔫

The next HighLoad ++ conference will be held on April 6 and 7, 2020 in St. Petersburg. Details and tickets here . HighLoad ++ Moscow 2018. Hall “Delhi + Calcutta”. November 8, 2 p.m. Abstracts and presentation .

I work in the VKontakte team and am developing a video broadcasting system.
In the report, I will share the features of the development of the backend, how our system has evolved, and the technical solutions that we have come to: 

how we did the video broadcast backend, and the evolution process as it is;
the impact of business and operational requirements on architecture;
“Wait” and “try again” will fail;
how the simplest tasks are complicated by the number of users;
how to reduce latency without UDP;
We conduct stress tests 2 times a day, or what Clover helped us with.

Mikhail Raichenko (hereinafter - MR): - A little excursion. I’ll tell you about the people who stream us, about the live (live), about which platforms we receive the video stream and which we distribute. In the end, I’ll talk about the current architecture of live, about its limitations and capabilities, as well as how the current architecture survived such an effect as “Clover”.

About live broadcasts. Report outline

First, I’ll talk a little about the live broadcasts and streamers themselves, sending us video content that we show to other viewers.
. ? , , , , - - , - , . , : , – .
. , . . , , .
, , . , . , , , , , 2014-2015 . , .
. , .
, . , . , .
, . .
, , «», .

All broadcast services look something like this:

We have some streamer, it sends an RTMP stream to us, and we show the audience - nothing surprising, nothing supernatural.

Where does the video stream come from? A significant source of traffic for us is our mobile application VKlife. What's good about him? In it, we can fully control how we encode the video. We can make many optimizations on the client side, so that later, with minimal delays, show it to our viewers.

Of the minuses: mobile applications work on top of networks. It can be 3G ... In any case, it is almost always mobile networks, which introduces some lags - you need to additionally buffer the data on the application side, so that the stream runs as smoothly as possible.

The second source is streamers with OBS, Wirecast or those who stream from other desktop applications. This is a fairly large audience. Sometimes these are seminars, sometimes - game streams (there are especially a lot of them from such applications). From the positive: there are few such applications, we can give our streamers good tuning recommendations. But at the same time there are a lot of settings and send not quite the stream that we want.

The third category is the RTMP stream from media servers. These can be very small media servers, that is, a home format: a person streams a view of the street or something else. Or quite serious broadcasts from our partners: there can be anything, there are not very many such streams, but basically they are very important for us.

Who's watching?

Again, this is a mobile application - everything is clear here. The biggest problem is network latency. From the pros: we can customize the player - it’s convenient for us, good, but not everywhere it turns out 100%.

Web player at vk.com. Here, too, everything is simple - it is a regular web player that you can open to watch. A sufficiently large audience at vk.com, a lot of viewers on the broadcasts. Some broadcasts hang in our “Video” section - there can be tens of thousands (sometimes without any PR), especially if this is interesting content.

Accordingly, the channels are large enough for viewers who are sitting on a web player. Therefore, there is a lot of traffic, including for one broadcast.

The third is the VKontakte web player on some third-party site. You can start streaming everything you want, and hang a VKontakte web player on your website. You can become our partner if you have interesting content: you can hang it on your own, you can hang with us - as you wish. You can organize your broadcasts in this way, and everything will work.

Comparison with video calls

In video calls, some image distortion will be forgiven. Video calls are simpler: we can significantly degrade the image, but at the same time we must maintain a very good latency. With a long delay, the service will be absolutely impossible to use.

In broadcasts in this sense, it’s a little vice versa: we must maintain high image quality, but at the same time we can increase latency due to many factors. For example, the player buffers its flow one way or another (it can be a second, two to survive the degradation of the network, for example), so there are no second, millisecond delays in most cases. You can strive for this, but food is not a prerequisite.

With ordinary video, the situation is exactly the opposite. We need very good quality. At the same time, it is desirable to minimize the video size, the ratio of bitrate and quality, so that with the minimum flow, give the best solution. From the pros: we are not limited in time: at the time of downloading the video, we have enough time to optimize the video, see how to compress it in the best way, do something, drag it to caches, if necessary - in general, everything is enough OK.

In live, the situation is, again, the opposite: we have very little time for transcoding. At the same time, there are few opportunities, but there are no expectations for the broadcast. The audience will forgive us if we have support or the quality is not very good.

Very first version

It’s quite expected:

Actually, it’s a little different:

“Streamer - media server - cache level - viewers”. In principle, this version allows you to scale quite strongly. I would say that it should already withstand tens of thousands of spectators. It has other drawbacks ...

For example, if you look at this circuit (previous slide), you can see that it is not fault tolerant. We must guess with the media server in order to balance the audience well. We cannot hang many caches on each server - it is simply expensive. Therefore, we looked, realized that it was simple and clear, there were some possibilities for scaling, but obviously something was missing ... And we began to formulate requirements.

Infrastructure requirements

What's important?

. , , . ( ). (, ).
. : -, (, ) ; -, – , , .
Delivery to the regions. Also an important point! It’s stupid to drag all the video content from Petersburg or Moscow to some kind of Novosibirsk, Yekaterinburg, or even from St. Petersburg to Moscow. This is not very good, since the network delay will be long - there will be lags, everything will lag, and this is not good. Therefore, our infrastructure must take into account that we deliver content to the regions.
Convenience of operation and monitoring. An important property. Since the system is large, there are many viewers, it is important to send alerts to administrators in time in case of any problems, including monitoring product and technical metrics.

What does the broadcast infrastructure look like now?

As a result, we came to a fairly simple but effective infrastructure ...

– , ( , ). RTMP-, . , .
, , , , . . , , , – : , -, (. . ); -, .
. – , . , , . , ( , ).
, (edge-). , . !
– edge-, , . : , – .

Interestingly? Balancing! In this scheme, we choose balancing, try to send viewers who watch the same stream to each edge server. The cache locality is very important here, because there can be many edge servers; and if we do not observe both the temporary and the locality of the cache from the point of view of the stream, then we will overload the inner layer. We would not like that either.

Therefore, we balance in the following way: we select a certain edge server for the region to which we are sending spectators, and we are sending until we begin to understand that some filling has occurred and should be sent to another server. The circuit is simple and works very reliably. Naturally, for different streams you choose a different sequence of edge-servers (the sequence in which we send out spectators). Accordingly, balancing works quite simply.

We also give the client a link to an available edge server. This is done so that in case of edge-server failure, we can redirect the viewer to another. That is, when the viewer watches the broadcast, he receives a link to the desired server once every few seconds. If he understands that the link must be different, he switches (the player switches) and continues to watch the broadcast from another server.

Another important role of edge servers is content protection. All content protection essentially takes place there. We have our own nginx module for this. It is somewhat similar to Security Link.

Scaling and balancing

. , : , . edge-, . . . … , – , -- – , – , , , – ! – .
. , , , , : . ( ) , – , . , , , edge-.
, , , .

?

One of the main protocols we had was RTMP (not only for input, but also for content distribution). The main advantage is low latency. It can be half a second, second, two seconds. In fact, the benefits end there ...

This streaming protocol is difficult to monitor - it is closed in a sense, despite the fact that there is a specification. The Flash player no longer works (in fact, it is “already everything”). Need support at the CDN level - you need special nginx modules or your own software in order to transfer the stream normally. There are also some difficulties with mobile clients. Out of the box in mobile clients it is supported very poorly - special improvements are needed, and all this is very difficult.
The second protocol we used is HLS. It looks quite simple: it is a text file, the so-called index file. It contains links to index files with different permissions, and the files themselves contain links to media segments.

How is he good? It is very simple, despite being a bit old. It allows you to use CDN, that is, you only need nginx to distribute the HLS protocol. It is understandable in terms of monitoring.

Here are its advantages:

ease of operation - nginx as a proxy server;
it’s easy to monitor and take metrics (it’s enough for you to monitor about the same as what you monitor on your website);
Now this protocol is the main one for content delivery.

Significant minus:

high latency.

HLS latency is actually nested in the protocol itself; since a long buffering time is required, the player is forced to wait at least when one chunk is loaded, but in a good way, he must wait until two chunks (two media segments) are loaded, otherwise in case of a lag the client will have the player load, and this does not very well affect user experience.

The second point that gives a delay in HLS is caching. The playlist is cached on the inner layer and on the edge servers. Even if we cache, relatively speaking, for a second or half a second, then this is approximately plus 2-3 seconds of delay. In total, it turns out from 12 to 18 seconds - this is not very pleasant, obviously it is better.

Improving HLS

With these thoughts, we began to improve HLS. We improved it in a rather simple way: let's give the last, not yet recorded media segment of the playlist a little earlier. That is, as soon as we started writing the last media segment, we immediately announce it in the playlist. What is happening at this moment? .. The

buffering time in the players is reduced. The player believes that he has already downloaded everything, and calmly starts downloading the desired segments. This way we “cheat” the player in this way, but if we monitor the “steel” (loads in the player) well, it doesn’t scare us - we can stop giving the segment in advance and everything will return to normal.

The second point: we win a total of about 5-8 seconds. Where do they come from? This segment time is from 2 to 4 seconds per segment, plus the time to cache the playlist (this is another 2-3 seconds). Our delay is decreasing, moreover, significantly. The delay is reduced from 12-15 seconds to 5-7.

What else is good in this approach? It is actually free! We only need to check whether the players are compatible with this approach. Those that are incompatible, we send to the old URLs, and we send compatible players to the new URLs. We do not need to upgrade old clients that support this, which is also important. We actually do not need to modify, release players in mobile clients. We do not need to develop a web player. It looks good enough!

Of the minuses - the need to control the incoming video stream. In the case of a mobile client, we can do this quite easily (when the stream comes from a mobile client), or transcode without fail, since the player must know how long it takes for one media segment of time. And since we are announcing it before it is actually recorded, we need to know how much time it will take when we record it.

Quality metrics

Thus, we have improved HLS. Now I want to tell how we monitor and what quality metrics we shoot:

One of the main quality metrics is the start time. Ideally, this is when you scroll through the mobile client before the broadcast, press the button, and it starts immediately. It would be ideal if it starts before you press the button, but, unfortunately, only when you press.
The second point is signal delay. We believe that a few seconds is very good, 10 seconds is tolerable, 20-30 seconds is very bad. Why is it important? For example, for concerts and some public events this is an absolutely unimportant metric, there is no feedback; we just show the stream - it’s better not to lag than a slight delay. And for a stream where some kind of conference is going on or a person tells something and they ask him questions, this starts to be important, because they ask a lot of questions in the comments and I want the audience to get feedback as early as possible.

Another important metric is buffering and lags. In fact, this metric is important not so much from the point of view of the client and quality, but from the point of view of how much we can “stretch” HLS delivery, how much we can squeeze data from our servers. Therefore, we monitor both the average buffer time in the players and the “steel”.

The choice of quality in the players is understandable: unexpected changes are always annoying.

Accordingly, this is also an important metric.

Monitoring

We have many monitoring metrics, but here I chose those metrics that always work if something went wrong:

-, . - , . , – , ( , – ).
/ . , , , , , . .
edge-. , , edge- .

«», ?

Now I’ll tell you how we handled an application such as “Player” when we used our infrastructure to broadcast a video stream with questions and answers.

Clover is an online quiz. The host says something, asks: questions fall out - you answer. 12 questions, 10 minutes of the game, at the end - some kind of prize. What's so complicated? This is growth!

On the right is this graph: The

peak [on the graph] is the load on the servers in terms of the API at the time of the start of “Clover”. The rest of the time is the usual stream of broadcasts. This cannot be equated to the number of viewers. Perhaps this is the number of requests and the load.

It's hard: in 5 minutes a million spectators came to us at the peak. They begin to watch the broadcast, register, perform some kind of action, request a video ... It would seem to be a very simple game, but it happens in a very short period of time (all actions, including the final game) - this gives a fairly high load.

What questions and challenges did we face?

Fast growth. Sometimes it was + 50% per week. If, for example, you have 200 thousand people on Wednesday, then on Saturday or Sunday there could already have been 300. That's a lot! Problems begin to surface that were not previously visible.
2 . . , . , , , , , .
12 . , , , , , - , , …
, . , , 200-300 , 400-500.
- , , , 3-5 . ? «», , , , .

( 3 , ), , 3-5 . ? – , – exponential backoff, .

exponential on backoff: – , 2 , 4, 8. backend-.

«»?

-, . , – .
« ». , , , , «». , , -, , -, .
, – , «» . «». , , , … – . 10% «» – , 10% «» 100 – .
«» , , «» . – - . 100 – , 1 15 – .
. «» , , , . , , .
. , . , latency, .
. – , .
«» 1 . , , . . , , . .

?

Architecture completely suits us, and I can safely recommend it. HLS will remain with us the main protocol for at least the website and at least the backup protocol for everything else. Perhaps we will switch to MPEG-DASH.

Abandoned RTMP. Despite the fact that it gives a lower delay, we decided that we would tune the HLS. Perhaps consider other delivery vehicles, including DASH as an alternative.

We will improve the incoming balance. We also want to make a seamless failover for inbound balancing, so that in case of problems on one of the media servers for the client, the switching would be completely invisible.

Perhaps we will develop delivery systems from edge servers to the client. Most likely, it will be some kind of UDP. Which one - we are now thinking and are in the research stage.

Actually, that's all. Thanks to all!

Questions

Question from the audience (hereinafter - A): - Thank you very much for the report! Only the speaker from Odnoklassniki spoke, and he said that they had to rewrite the streamer, the encoder, the encoder ... Do you use such solutions or use the stock ones that are on the market (like Harmonic, etc.)?

MR: - Now we have spinning third-party solutions. Of the open source solutions we used, we had nginx, the RTMP module (for a long time). On the one hand, he pleased us because he worked, quite simple. We fought for a very long time so that he would work stably. Now he is moving from Nginx-RTMP to his own solution. We are thinking with a transcoder now. The receiver, namely the receiving part, has also been rewritten from Nginx-RTMP to its solution.

AND:- I wanted to ask a question about slicing RTMP on HLS, but as I understand it, you already answered ... Tell me, are your solutions open-source?

MR: - We are considering releasing it in opensource. We would rather want to release it, but the question is the time of release in open source. Just put the source on the Internet - this is not enough. You need to think, make some examples of deployment. And for what purpose are you asking? Want to use?

A: - Because I also came across Nginx-RTMP! It is, to put it mildly, not supported for a very long time. The author doesn’t even answer particularly questions ...

MR: - If you want, you can write to the mail. Give for own use? We can agree. We really plan to open it up.

AND:- You also said that you can move from HLS to DASH. HLS does not suit?

MR: - This is a question about what we can, or maybe not. It greatly depends on what we will come to in terms of researching alternative delivery methods (i.e. UDP). If we find something “completely” good, then we probably won't touch HLS. If it seems that MPEG-DASH is more comfortable - maybe we’ll move it. It does not seem to be very difficult, but we are not sure whether it is necessary or not. The synchronization between streams, between qualities and other things is clearly better there. There are pluses.

A: - Regarding alerts. You talked about the fact that if streams stop streaming, then this is immediately an alert. And you didn’t catch something independent of you: the provider fell, Megafon fell, and people stopped streaming? ..

MR:- Let's just say that something independent of us is basically all kinds of holidays and so on. Yes, they did. Well, they did, yes. Administrators looked - today is the holidays, the rest of the characteristics are all right - they calmed down.

A: - And about scaling. At what level does it scale? For example, I asked for a stream from the phone - will I immediately receive a certain link with the correct ash-server?

MR: - A link will come immediately and, if necessary, will switch you (if there are problems on a specific server).

A: - Who will switch?

MR: - Mobile player or web player.

A: - He will restart the stream or what?

MR: - A new link will come to him where he should go live stream. He will go there and re-ask the stream.

AND:- At what level do you have caches? And playlists, and chunks, or just that? ..

MR: - And playlists, chunks! It caches a little differently, both in terms of time and in terms of the return of time, but we cache both.

A: - Regarding the creation of ash-servers? Did you have such a situation that you watch 2 million viewers on one broadcast, do not have time for something and quickly pick up some ash-server? Or do you raise everything in advance with a margin?

Mr:- Perhaps this was not. Firstly, the stock is always small or large - it does not matter. Secondly, it doesn’t happen that a broadcast suddenly becomes super popular. We can predict the number of viewers well. Basically, in order to have a lot of people, we must make an effort. Accordingly, we can adjust the number of viewers of the broadcast, depending on what efforts are made.

A: - You said that you measure instrumental delays on the part of the player. How?

MR: - Very simple. In chunks (in the media segments) we have a timestamp, in the name of the media segment - in the player we just return it. If he is not explicitly at all, he returns.

A: - I remember trying to run peering on WebRTC. Why refused?

Mr:“I cannot answer this question for you — it happened without me.” I can’t say why I tried it and I can’t say why I refused.

A: - Regarding the receiver and the media server. As I understand it, you used to have Nginx-RTMP, now both there and there you have self-made solutions. In fact, this is one media server that proxies other media servers, and they are already in the cache and on the edge.

MR: - So, not really. This is a self-made solution, but it is different both in terms of a media server and in terms of a receiver. Nginx-RTMP - it was some kind of harvester that could do both. Our interiors of the receiver and the media server are very different - both by code and throughout. The only thing that unites them is RTMP processing.

A: - Regarding the balancing between the edges. How does this happen?

MR: - We measure traffic, send it to the desired server. I didn’t understand the question a little ...

A: - I’ll explain: the user requests a playlist through the player - he returns the relative paths to manifests and chunks or absolute paths, for example, by IP or domain? ..

MR: - The paths are relative.

A: - That is, there is no balancing when viewing a stream by one user?

MR: - It happens. Tricky enough. We can use the 301st redirect when overloading the server. If we see that everything is completely bad there, for segments we can send it to another server.

A: - But it should be wired into the logic of the player?

Mr:- No, just this part should not be sewn up. 301st redirect! Simply, the player should be able to exit the 301st link. We can send it to another server from the server side at the time of overload.

A: - That is, he asks the server, and the server gives it to him?

MR: - Yes. This is not very good, therefore, the player’s logic is used to retweet links to the case of failure of one of the servers - this is already in the player’s logic.

A: - And you didn’t try to work not in relative, but in absolute ways: when you request the player to do some magic, find out where there are resources and where not, and already give playlists indicating a specific server?

Mr:- Actually, it looks like a working solution. If you had come then, we would have weighed both! But the current solution is also working, so I don’t really want to jump from one to the other, although, perhaps, this also looks like a working solution.

A: - Tell me, is this somehow friendly with multicast? HLS, as I understand it, absolutely nothing ...

MR: - In the current implementation, we probably have nothing in the system with the multicast in live. We do not include the concept of multicast there. Perhaps somewhere in the depths of admin magic, there is something inside the network, but it is hidden from everyone at all and nobody knows ...

A bit of advertising :)

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends, cloud VPS for developers from $ 4.99 , a unique analog of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only we have 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands!Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - from $ 99! Read about How to Build Infrastructure Bldg. class c using Dell R730xd E5-2650 v4 servers costing 9,000 euros for a penny?

HighLoad ++, Mikhail Raichenko (ManyChat): almost without magic, or how easy it is to distribute terabit video stream