Brotli Efficiency in the Real World

One of the most fundamental rules for developing fast websites is to optimize their resources. If we are talking about text resources - about code written in HTML, CSS and JavaScript, this means that we are talking about data compression. The de facto standard for compressing text resources on the web is the Gzip method. Namely, about 80% of the compressed resources obtained during site downloads are compressed using Gzip. To compress the remaining 20% ​​of the resources, a much newer algorithm is used - Brotli.





Of course, these 100% of the compressed resources that come into browsers when they receive answers to requests to sites do not include absolutely all resources. There are still many resources that could be compressed, or that could be compressed. But these resources remain uncompressed. More detailed metrics regarding compression can be found in the Compression section of the Web Almanac resource.

The gzip compression method is incredibly efficient. All Shakespeare's work in plain text takes 5.3 MB. And after Gzip-compression (compression level 6) they occupy only 1.9 MB. This means that the file size in which this data is stored has decreased by 2.8 times. At the same time, data is not lost during compression. Great!

Even better, the degree of Gzip compression is affected by the presence of duplicate lines in files. The more repetitions in the text, the more efficient the compression. This is very good for the web, since the code written in HTML, CSS and JS has a uniform syntax and contains many repetitions.

But, although Gzip is a very efficient compression method, it is also quite old. It appeared in 1992 (which, of course, helps explain its widespread prevalence). After 21 years, in 2013, Google released Brotli, a new algorithm that promises even higher levels of compression than those that the Gzip method is capable of. The same works of Shakespeare 5.2 MB in size are compressed using Brotli to a size of 1.7 MB (with a compression level of 6). And this already means a 3.1-fold reduction in file size. This is even better than using Gzip.

Using a tool to evaluate the level of data compression using Gzip and Brotli, you are likely to find out that when compressing some data, Brotli is much more efficient than Gzip. For example, ReactDOM data is 27% smaller when compressed using Brotli with a maximum compression level (11) than when using Gzip with a maximum compression level (9).

Here is a comparison of Brotli compression with Gzip compression when processing ReactDOM.

Compression levelSize (in bytes)Compression Efficiency (Compared to Uncompressed Data)% Improvement over Gzip
1434562.733%
2398982.97eleven%
3394163.08fifteen%
4384883.08fifteen%
5363233.27nineteen%
6360483.29twenty%
7358043.31twenty%
8357093.3221%
9356593.3321%
10335903.5325%
eleven330363.5927%

As you can see, at all compression levels Brotli bypasses Gzip. At the maximum compression level available with Brotli, it is 27% more efficient than Gzip.

And, based on personal observation, I note that the transition of one of my clients from Gzip to Brotli led to a median reduction in file sizes by 31%.

As a result, over the past few years, I, along with other performance experts, have been encouraging customers to switch from Gzip to Brotli.

I will say a few words about browser support for Gzip and Brotli. Gzip is so widespread that CanIUse doesn't even display a table with support information. It says so: "This HTTP header is supported in almost all browsers (starting with IE6 +, Firefox 2+, Chrome 1+, and so on)." And Brotli, while writing this material, enjoys a very pleasant level of support.at 93.17%. And this is very, very much! Thus, if your site has at least some significant size, you may not particularly like the return of uncompressed resources to more than 6% of your users. But using Brotli, you won’t lose anything. Customers use a progressive model of support for new algorithms, so users who cannot accept Brotli resources will simply use the fallback option in the form of Gzip. We will talk more about this below.

In general, especially if you use CDN, turning on Brotli is a matter of seconds. At least this is the case with Cloudflare, the service I use for the CSS Wizardry site. However, many of my clients, if we talk about the last couple of years, are not so successful. Some of them support their own infrastructure, and practice shows that installing and deploying Brotli is not so simple. Some use CDN services that do not differ in easily accessible capabilities to support the new algorithm.

In those cases when we could not switch to Brotli, we always had an unanswered question: "What if ...". As a result, I finally decided to arm myself with numbers and give an answer to the question of what gives the site the transition to Brotli.

Less is not necessarily faster.


Usually “less”, however, means “faster.” As a rule, if you reduce the file size, it will be transferred faster from the server to the client. But if you make a file, say, 20% smaller, this does not mean that it will arrive 20% faster. The point here is that file size is just one aspect of web performance. And whatever the size of the file, the resource delivered to the browser is associated with many other factors and with many system limitations — network delays, packet loss, and the like. In other words, saving on file size helps to transfer the same data as before, sending fewer packets, but the data transfer between the server and the client is limited by network delays, as a result, the speed with which packets arriving at the client will be smaller will not change.

TCP, packets, round-trip delay


If it’s very simplistic to consider transferring files from the server to the client, we will have to take a look at the TCP protocol. When we receive a file from the server, it does not come to us in one go. The TCP protocol, on top of which HTTP works, breaks the files into segments called packets. These packets are sent to the client in order, in sequence. The client confirms the receipt of each packet in the series before starting the transfer of the next series. This happens until the client collects all the necessary packages, until there are no unsent packages on the server, and until the client can collect the packages into something that can be recognized as a file. In order for the packet sequence transfer session to complete successfully, the server must send them to the client, and the client must acknowledge their receipt. Time,The amount of data required to receive data and receive confirmation of reception is called Round Trip Time (RTT).

Each new TCP connection cannot know what the available bandwidth is, and how reliable the connection is (that is, what is the level of packet loss). If the server tries to send megabytes of data over a connection that supports a data transfer rate of one megabit per second, such a transfer will overwhelm the connection and lead to an overload of the communication channel. And vice versa - if the server tries to transfer a small amount of data through a very fast connection, the connection will be used inefficiently, it will be idle.

In order to solve this puzzle, TCP uses a mechanism called slow start. This is part of the overload window management strategy. Each new TCP connection is limited by the ability to send only 10 data packets in the first sequence of packets (10 packets - the size of the initial congestion window). Ten TCP segments are approximately 14 KB of data. If these packages are successfully received by the client, the second series will already contain 20 packages, then there will be 40, 80, 160 and so on. The exponential growth of packets in sequences will continue until one of the following events occurs:

  1. The system will face packet loss. At this point, the server will reduce the number of packets in the following sequence, dividing the previous number of packets by 2, and try to transfer the data again.
  2. We have reached the limit of available bandwidth and can use it at full capacity.

This simple and elegant strategy allows you to balance on the verge of caution and optimism. It applies to every new TCP connection established by the web application.

In simple words, the initial size of the congestion window of the new TCP connection is only about 14 Kb. Or about 11.8% of uncompressed ReactDOM data. Either 36.94% of the data compressed using Gzip, or 42.38% of the data compressed using Brotli (at the maximum compression level).

And then we’ll slow down. The transition from 11.8% to 36.94% is already a very noticeable improvement! But the transition from 36.94% to 42.38% - this is far from so impressive. What's happening?
Data Session NumberThe amount of data transferred in one session, KbCumulative amount of data transferred, KbThe sequence in which the ReactDOM data is transferred
11414
22842Gzip (37.904 Kb), Brotli (33.036 Kb)
35698
4112210Uncompressed option (118.656 Kb)
5224434

It turns out that both data compressed with Gzip and data compressed with Brotli are transmitted in the same series of packets. Transferring a file takes two sequences. If the RTT turns out to be fairly uniform when transmitting all the sequences, this means that it takes the same time to transmit data compressed using Gzip and Brotli. On the other hand, transferring an uncompressed version of data requires four series of packets, not two. And this, especially on connections with high network latencies, can result in a rather noticeable time required for data transfer.

I tend here to the fact that data transfer speed depends not only on file sizes. It is influenced by the features of the functioning of the TCP protocol. We do not just need to make the files smaller. We need to make them much smaller, bringing them to a size that will allow them to be transmitted in fewer packet sequences.

This means, in theory, that in order for the Brotli algorithm to be noticeably more efficient than Gzip, it must be able to compress data much more aggressively. This is necessary so that data can be transmitted in fewer packet sequences than when using Gzip. And I don’t know how this algorithm will develop ...

It is worth noting that the above model is quite simplified. There are many other factors to consider. For example - is it a new or already open TCP connection? Is the connection being used for something else? Are server traffic prioritization mechanisms stopping and starting data transfer? Does H / 2 streams have exclusive access to bandwidth? This section is a more serious study. It should be considered as a good starting point for your own research. But consider thoroughly analyzing the data using something like Wireshark, and read this material, which provides a deeper description of the “magic” first 14 Kb.

The above applies only to brand new TCP connections. Files transferred over an existing connection will not go through the slow start procedure. This leads us to two important conclusions:

  1. I don’t think it’s worth repeating, but I will repeat: static resources need to be hosted. This is a great way to avoid slow-start delays, since using your own, already “warmed up” server means that packets leaving this server have access to a wider bandwidth. This conclusion leads me to the second conclusion.
  2. , , . , . .

, ,,
11414
22842
35698
4112210
5224434
6448882
78961778
817923570
935847154
10716814322
20734003214680050

At the end of the 10th data transfer session, the amount of data transferred in one session is 7168 Kb, while, in total, 14322 Kb of data has already been transferred. This is more than enough for ordinary work on the Internet (that is, not for viewing the Game of Thrones). In fact, it usually happens that we load the entire web page and all its resources without even reaching the limit of our bandwidth. In other words, using a 1 Gbit / s fiber optic communication channel (that is, 0.125 GB / s) will not make normal browsing seem much faster than using a slower connection, since most of this channel does not even will be used. 

And by the 20th data transfer session, we theoretically transfer 7.34 GB of data in one sequence of packets.

What about the real world?


So far, we have been engaged in theoretical considerations. And I started working on this material due to the fact that I would like to know about the impact that Brotli can have on real sites.

So far, the numbers given here have pointed to the huge difference between the lack of compression and the use of Gzip, and the fact that the gain from using Brotli, compared to Gzip, is quite modest. This tells us that the transition from the lack of compression to the use of Gzip will give a noticeable improvement, while the transition from Gzip to Brotli may probably look much less impressive.

I selected, as examples, several sites, guided by the following considerations:

  • The site should be relatively well-known (it is better to use sites that can be compared with something).
  • The site should be suitable for the test. That is - it should be of a suitable size (so it will be more interesting to analyze its materials for compression), and at the same time it should not contain mainly materials that are not compressed using Gzip / Brotli - like, for example, YouTube.
  • Not all sites from the collection should belong to large corporations (it is worth analyzing some, let's say, “regular” sites).

Given these requirements, I selected the sites listed below and started testing. Here are the sites I selected:


I did not want to complicate the tests, so I settled on the following indicators:

  • The amount of data transferred.
  • First contentful paint (FCP) time.

They were analyzed in the following situations:

  • Lack of compression.
  • Using gzip.
  • Using brotli.

The FCP metric looks close to the real world and universal enough for its application to any site, since it allows you to evaluate what people need from websites - that is, the contents of these sites. In addition, I chose this metric because Paul Calvano , an intelligent person, said this: “Experience tells me that using Brotli leads to improved FCP, especially when critical CSS / JS resources are large” .

Testing


I’ll tell you one dirty secret. Many studies of web performance (not all, but many) are not based on research on performance improvements, but on drawing conclusions from the opposite - from performance degradation. For example, the BBC is much easier to claim that “they lose 10% of users for every extra second they need to load their site” than to figure out what’s happening thanks to a one-second improvement. It’s much easier to slow down the site rather than speed it up, and you get the feeling that this is why many people do this job so well.

Given this, I did not try to first download sites that use Gzip, and then, offline, somehow compress their contents using Brotli. Instead, I found sites that use Brotli, and then turned off compression. I went from Brotli to Gzip, and then from Gzip to non-compression, measuring how this works on the site.

Although I cannot, say, connect to the Linkedin server and disconnect Brotli, I can simply access this site from a browser that does not support Brotli. Although I am not able to disable Brotli support in Chrome, I am able to hide from the server the fact that my browser supports Brotli. Browsers tell servers which compression algorithms they support using the request headercontent-encoding. Using Webpagetest, I can customize the headers myself. So, everything is very simple!


The advanced features of WebPageTest allow us to set custom request headers.

Here's how I set up the Custom Headers field:

  • Complete shutdown of compression: accept-encoding: randomstring.
  • Disabling Brotli, but Gzip support: accept-encoding: gzip.
  • To use Brotli if this compression method is supported by the site (and provided that it is supported by the browser): the field remains empty.

You can find out if this works as intended by checking the presence (or absence) of the header content-encodingin the server response.

results


As expected, the transition from lack of compression to Gzip meant a significant improvement, but the transition from Gzip to Brotli does not look so impressive. The raw data from my experiments can be found here . The following are the findings that interest me the most:

  • Gzip : 73%.
  • FCP Gzip : 23.305%.
  • Brotli Gzip: 5.767%.
  • FCP Brotli Gzip: 3.462%.

These are all median values. Speaking of “material sizes,” I mean only HTML, CSS, and JavaScript.

Thanks to the use of Gzip, file sizes were reduced by 73% compared to their uncompressed versions. And the use of Brotli allowed to reduce file sizes only by an additional 5.7%. If we talk about FCP, thanks to Gzip this indicator improved by 23% compared to the lack of compression, and Brotli added only an additional 3.5% to this.

Although it seems that such results reinforce the theory, there are several ways to improve these results. The first is to test a much larger number of sites, I would like to discuss two more in more detail.

Own resource data and data from external sources


In my tests, I turned off Brotli everywhere, and not just for servers that store site data. This means that I measured not only the benefits that sites get from using Brotli, but, in terms of potential, the benefits that Brotli gets from the external sources that these sites use. This falls into the scope of our interests only if third-party resources are used in the critical ways of the sites under investigation, but this is worth remembering.

Compression levels


Speaking of compression, we often discuss the results obtained with the best compression application scenario. Namely - when using Gzip we have in mind the 9th level of compression, and when using Brotli - 11th level. However, it is unlikely that the server under investigation will be configured in the most optimal way. For example, Apache uses level 6 gzip compression, while NGINX uses only the first.

Disabling Brotli means that we are switching to the fallback option, to Gzip, and given the way I tested the sites, I could not change such a fallback configuration or act on it somehow. I say this because the materials of the two sites in the test actually increased in size when Brotli was turned on. This indicates to me that the level of Gzip compression was such that it provided stronger compression than the level of Brotli compression.

Choosing a compression level is a compromise. Everyone would like to ask the highest level of compression and on this consider the issue resolved. But such an approach is impractical. The fact is that the extra time that the server will need to dynamically perform this compression is very likely to negate the benefits of a higher compression level. In order to cope with this problem, you can resort to the following:

  1. You can use a pragmatic level of compression that provides the right balance of speed and efficiency during dynamic data compression.
  2. You can upload pre-compressed static resources to the server, the compression level of which is higher than that used for dynamic compression. In this case, to select the level of dynamic compression, you can use the idea described in the first paragraph.

Summary


One gets the impression that, reasoning sensibly, one can recognize the insignificance of Brotli's advantages over Gzip.

If enabling Brotli support is a matter of a couple of mouse movements in the control panel of your CDN, then you should take Brotli and turn it on right now. Support for this compression algorithm is wide enough, browsers that do not support Brotli easily switch to spare mechanisms, and even a slight improvement is better than nothing.

If possible, upload static resources pre-compressed at the maximum compression level to the servers. And for dynamic compression, use not the highest, but not the lowest compression levels. If you use NGINX, make sure that you are not using the standard first level of compression for NGINX.

However, if in order to use Brotli, you may need weeks of development, testing and deployment, do not panic - just make sure to use Gzip compression for everything that can be compressed (this includes, in addition to text resources, files .ico and .ttf - if they are, of course, used in your project).

I suppose a short version of this article might look like this: if you should not or cannot enable Brotli, you are not losing so much.

Dear readers! Do you plan to use Brotli?


All Articles