🔥 🕥 🌷 How we accelerated video encoding eight times 🤨 ♋️ 🏯

Every day, millions of viewers watch videos on the Internet. But for the video to become available, it must not only be uploaded to the server, but also processed. The faster this happens, the better the service and its users.

My name is Askar Kamalov, a year ago I joined the Yandex video technology team. Today I will briefly tell Habr’s readers about how, using parallelization of the coding process, we managed to speed up the delivery of video to the user many times.

This post will primarily be of interest to those who have not previously thought about what is happening under the hood of video services. In the comments, you can ask questions and suggest topics for future posts.

A few words about the task itself. Yandex not only helps to search for videos on other sites, but also stores videos for its own services. Whether it’s an author’s program or a sports match in Ether, a movie on KinoPoisk or videos in Zen and News - all this is uploaded to our servers. In order for users to watch a video, it needs to be prepared: convert it to the required format, create a preview, or even drive it through DeepHD technology . An unprepared file just takes up space. And we are talking not only about the optimal use of iron, but also about the speed of content delivery to users. Example: a record with the decisive moment of a hockey match can be searched in the search within a minute after the event itself.

Sequential coding

So, the user's happiness largely depends on how quickly the video becomes available. And this is mainly determined by the speed of transcoding. When there are no strict requirements for video upload speed, then there are no problems. Take a single, indivisible file, convert it, upload. At the beginning of our journey, we worked like this:

The client uploads the video to the repository, the Analyzer component collects meta information and transfers the video for conversion to the Worker component. All steps are performed sequentially. At the same time, there can be many servers for encoding, but only one is busy processing a specific video. Simple, transparent layout. This is where its merits end. Such a scheme is scaled only vertically (due to the purchase of more powerful servers).

Sequential coding with intermediate result

In order to somehow smooth out the painful expectation, the industry came up with a quick coding option. This is a deceptive name, because in fact, full-fledged coding takes place sequentially and for as long. But with an intermediate result. The idea is this: prepare and upload the low-resolution version of the video as soon as possible, and only later - higher-resolution versions.

On the one hand, video is becoming faster. And it is useful for important events. But on the other - the picture is blurry, and this annoys the audience.

It turns out that you need not only to quickly process the video, but also to preserve its quality. This is what users now expect from a video service. It may seem that it is enough to buy the most productive servers (and regularly upgrade them all at once). But this is a way to a dead end, because there is always a video that will slow down even the most powerful hardware.

Parallel coding

It is much more efficient to divide a difficult task into many less complex ones and simultaneously solve them on different servers. Such is MapReduce for video. In this case, we do not rest on the performance of a single server and can scale horizontally (by adding new machines).

By the way, the idea of splitting a video into small pieces, simultaneously processing and gluing them together is not a secret. You can find many references to this approach (for example, on Habré I recommend a post about the DistVIDc project ). But this generally does not make it easier, because you can’t just take a ready-made solution and build it into yourself. We need to adapt to our infrastructure, our video and even our workload. In general, it's easier to write your own.

So, in the new architecture, we divided the monolithic Worker block with sequential coding into microservices Segmenter, Tcoder, Combiner.

Segmenter splits the video into fragments in about 10 seconds. Fragments consist of one or more GOP ( group of pictures ). Each GOP is independent and encoded separately, so that it can be decoded without reference to frames from other GOPs. That is, fragments can be reproduced independently of each other. This segmentation reduces latency, allowing you to start processing earlier.
Tcoder . , , (, , ), . , Tcoder .
ombiner : , Tcoder, .

A few words about the sound. The most popular AAC audio codec has a nasty feature. If you code the fragments separately, then seamlessly gluing them together simply will not work. Transitions will be noticeable. Video codecs have no such problem. Theoretically, you can look for a difficult technical solution, but this game is just not worth the candle (audio weighs significantly less than video). Therefore, only video is encoded in parallel with us, and the audio track is processed as a whole.

results

Thanks to parallel video processing, we significantly reduced the delay between downloading a video to us and its availability to users. For example, earlier it could take two hours to create several full-fledged versions of different quality for a FullHD movie lasting an hour and a half. Now it all takes 15 minutes. Moreover, in parallel processing, we create a high-resolution version even faster than a low-resolution version with the old approach with an intermediate result.

And something else. With the old approach, either servers might be missing or they were idle without tasks. Parallel coding can increase the share of iron utilization. Now our cluster of more than a thousand servers is always busy with something.

In fact, there is still room for improvement. For example, we can save a lot of time if we start processing fragments of a video even before it has arrived in full. As they say, further - more.

Write in the comments about what tasks in the field of working with video you would like to read.

How we accelerated video encoding eight times

Sequential coding

Sequential coding with intermediate result

Parallel coding

results

Useful links to industry peers

More articles: