BFCache, or There and back. Yandex Report

People use the return button to the previous page in the browser very often - perhaps more often than you think. And if so, then why immediately throw the page out of the browser’s memory, and after a second spend time and traffic reopening it? So that the user can quickly go back, BFCache technology was invented, which is important to remember when developing interfaces. Victor Khomyakovvictor-homyakov figured out "round-trip caching" and compiled a BFCache compatibility table with different APIs.


- Hello, my name is Victor. I work as part of a fairly large team that deals with the search page.



At a minimum, you've already seen the same or similar page on Google. And in particular, I deal with the problem of the speed of loading this page - so that it renders as quickly as possible on the server and downloads and displays to clients as quickly as possible. Why is it important? The less client’s waiting for your page to load, the less likely it is that he won’t wait and leave you. And the more likely it is that the client will successfully convert to something else, the more will be the net promotion score. That is, the client will happily tell everyone they know that this is an awesome cool page - it loads very quickly, it’s very convenient to use. And ultimately, the more money you can earn. Or your company, then it will give you a prize.

I will give a number of examples from well-known companies. Google conducted an experiment. They intentionally embedded a delay on the search page and measured how this affects performance. It turned out that on average there were half a percent less searches per user. What is half a percent? Calculate: half a percent of hundreds of millions of Google users is a pretty large number.

Bing did the same experiment. They did not believe Google, they decided to double-check. They got similar results: noticeably less revenue per user when the page slows down. Why slowdown? Because it’s easier to slow down the page by the exact number of milliseconds so that it can easily be reproduced in production than to speed it up by the same amount.

Example from AliExpress. They accelerated their site by 36% and received significantly more orders from users. Orders are direct money. In general, it is already clear to everyone that speed is quite important, it affects, through a certain number of metrics, the money earned.

And one more factor. Today we have already talked about image optimization. By optimizing your traffic, reducing the number of downloads, you pay less money to your hosters for outgoing traffic. This is also the money that will remain in your pocket. What if I suddenly offer you a 10% discount on traffic from any host without any restrictions and conditions? And if I propose to make sure that the share of your pages - for example, ten percent - is loaded by the user almost instantly? No one will refuse.

The technology that I will try to talk about today is one of the possible solutions that imposes few restrictions on which stack you work with, which technologies, but at the same time promises to give you such pretty significant gains.

To begin with, Google collects statistics on how these browsers are used in their browsers. And they published such a number: it turns out, on average, of all page openings, of all navigation in mobile Chrome, about 19% is a movement back and forth through the story. Think what that means? If you round, it turns out that 20% of the navigation is moving to the page where the user has just been.

For us, as for the authors of the pages, this means: even if the user leaves it, there is a considerable chance that he is about to return. On the one hand, this may be precisely the problem of mobile phones: everything is small, it’s easy to miss a finger, click on the link and leave the page, then say: “Oh hell! I want to return". But on desktops, the situation is about the same: there the number is less, but still there is a significant percentage of returns.

What are we doing at this time? We spend ineffectively on user time and traffic. That is, we begin to reload the same page back, parse it, recreate the DOM, redraw everything on the screen, load, execute JavaScript.

The browser is a pretty powerful thing. He is trying to use caches wherever possible. And most of the resources may be in his cache. He will not wait for them from the network, but will pick it up directly from the cache. For example, in the V8 engine, the result of parsing JavaScript is also cached. That is, the browser will not re-load and parse JS, and in most cases it will take to immediately execute it. But still, rebuilding the DOM, reloading non-cached resources, and running JS takes a considerable amount of time.

The solution suggests itself. What are we doing? We, when the user leaves our page, do not immediately clean it. We simply save its state and visually hide it from the user so that under the hood it remains at the disposal of the browser.

What will we do if the user decides to return? Just show him the same saved page. It can be shown almost instantly.



This technology is called back / forward cache, from the words "back and forth". Short for bfcache.

Here is an example of how the same browser, the same assembly, behaves when bfcache is off and on. The first page opening is equally slow both there and there. But further, when we begin to move back and forth through the story, a pause is noticeable on the left, and it is not on the right. On the left, the usual movement through history takes noticeable time. On the right, everything happens very, very fast.

Show GIF

A similar example from our search. On the left is the usual Safari on macOS with bfcache disabled, on the right is the same Safari with the default settings and bfcache enabled. This is a fairly common case: a person comes in search, may not know exactly what he is looking for, may ask several query options. I asked the first request - something is not right. The second request seems to be better. Third - no, worse, go back to the previous request. And at this moment it would be very good not to make him wait. He just saw this previous request, show it right away.

Or the second option, if you have pagination and several pages on the issue. Man leafing through the issue. I went to the second, third, fourth page, looked - no, there is something wrong, I'll be back. And we, again, can show him the previous pages almost instantly.

An important issue is security. While the page on which the user was not in a hidden state, it could access various APIs that allow you to read the hardware status of your phone or computer. Here is a short list of what immediately came to mind: geolocation, changing the position of your device in space, access to the camera, and sound from the microphone.

Then, when the page appears, it is important that it does not gain access to all the events that occurred during the time it was hidden. Otherwise, an additional channel will open for spying on users. It is important that she does not get a history of your movements over all this time and the recordings of the microphone and camera. Browser developers should not forget about this either.

API and browser support


Closer to the topic. Suppose I have already persuaded you, you are: "Yes, a good topic, we must work with this." What APIs do we have at our disposal, what can we work with if we agreed to take bfcache into account, and how is this supported by browsers?

Where does bfcache already exist, where can I see it?

- It has long been implemented in the browsers Firefox, Safari (and macOS, and iOS), as well as in Internet Explorer 11 (!). Usually we scold Microsoft developers, but here they just tried.

- It is implemented in the browser UC Browser 11/12, version for Android.

- Suddenly he is not in Chrome. And in Chromium this feature is under development.



So, when they do this in Chromium, almost all of these browsers (and this is not a complete list) will sooner or later get this functionality - for free, without SMS and registration.

Is there any kind of API? I want to manage bfcache, I want to turn it on and off directly from JavaScript, to find out if there is any page in bfcache or not. How about such an API? There is no such API. And this was done consciously: the page should not want to turn bfcache on or off for everyone and for itself. Or find out if there is anyone in this bfcache or not. This is all due to security.


Link from the slide

But what do we have? Types of navigation. There is a type of link - link prerender, when we want to render a page in advance. There is a special type of navigation for him: this page will be opened with the NavigationType “prerender”. If we just open the page in a browser, then there will be “navigate”. If we click on the “Update” button, it will be “reload”.

We are interested in the “back_forward” navigation type here, this clearly indicates that the user moved back and forth through the story. This is exactly the type of navigation our bfcache can work with.



Another API is the pageshow pagehide events. They already existed in browsers. Accordingly, pageshow is triggered when your page is displayed in the browser to the user, pagehide is triggered when the page is hiding for any reason. And they were supplemented by the persisted field. If the page is hiding and at the same time will be placed in bfcache, then the persisted field will be true. If the page is displayed when raising it from bfcache, then the persisted field will again be true.

Accordingly, if the browser is not going to cache the page, then pagehide will persisted false. And if the browser displays the page during normal loading, or it does not use bfcache, then pageshow will also persisted false.


Link from the slide

Event support is available in almost all browsers, even those that do not yet support bfcache.


Link from the slide

The same goes for the persisted field. It already exists in Chrome, and Chrome still does not support bfcache. That is, this field will always be in it, but for now it will be false.

When I came across this phenomenon, bfcache, I had to study it, tap on all sides, watch how it works. I immediately wanted to see on my page when I open what the value of the persisted field in my handlers is equal to.



It would seem that everything is quite simple. I wrote a handler and output to console.log () what comes to me. But when opening DevTools in some browsers, bfcache may suddenly shut down. That is, I opened DevTools, I go back and forth through the pages, and my persisted is always false, the page does not get into bfcache. Okay, I have another powerful tool - alert.

But no. Modern browsers when unloading a page in the pagehide, beforeunload and unload handlers simply ignore alert, it simply does not work there. And again, I don’t see what I want.



Okay, I have an even more killer product. I’m in a block right on my own page, which I’m exploring, just adding the text of the contents of my event and thereby logging everything. This method worked.



Everything, please, can be used. I debugged my code, it works for me, I can continue to continue with it. Of course, I do not forget that after all, an external static script is better suited so as not to load the same inline code on the page, but to use file caching.

I put this debugged code into an external script.



But no, the pageshow pagehide handlers fell off in Safari! For some reason, they do not work from an external script.

Okay, I already have a working version. I had to leave it like that.



I will briefly list what I managed to tread on in just one day. First, DevTools can disable caching. You probably remember that in DevTools on the Network tab in Chrome there is a checkbox "Disable cache". It disables the network cache, it may not affect bfcache, or it may. The analogy is clear: we opened DevTools, which means that we are developing and we may not need caching. Maybe it bothers us.



The second feature is alert. Firefox and Safari will silently ignore it and continue to execute handlers further, as if there was no alert. And only one good-old Chrome in the console will write an error in red - you have an alert, I blocked it, know about it!

Once again I remind you that handlers from an external script in Safari may not be called, this is very strange.

And one more important news. If your page is cached, that is, you received a pagehide event, and it says persisted true, and the browser says to you: “Yes, I put it in the cache” - this does not give any guarantees that the page will ever be later from this cache is raised and shown back to the user. Because the browser may decide to clear this cache if it runs out of memory. Because the user can close the browser and not navigate anywhere. Remember this.

Implementation Features


I began to delve further into the documentation, to research how I can live with this knowledge. Surprisingly, the documentation was. That is, you can dig up on the Internet a description of how bfcache works in browsers. But, the further I read, the more differences accumulated between different browsers.

In one, it works like this, in the other it works. In one, one interferes, the other does not interfere. Developers do not know how to correctly process a number of APIs when placing a page in bfcache. They say: ok, if the page uses this API, then I just ignore it, never put it in the cache under any circumstances. And this list is different in different browsers, each does as it suits it.

And then I began to combine what I learned into one table. I got something like the following:



I read the documentation for browsers - for Firefox, Safari, the Chromium family. There was available documentation on IE, albeit outdated. We programmers do not like to update the documentation after changes in the browser? When I realized that the information was out of date, I began to test my small pages in browsers and check which API works and which doesn't.

This, too, was not enough: I don’t know which APIs to look at in principle, but not to sort through all-all at all? And I had to look into the source code of the browser engines themselves, that is, the code turned out to be the most accurate and reliable source of knowledge. At the moment, this plate (in front of you is a piece of it, here is a link to the full version) is the most comprehensive collection of knowledge about which APIs allow or prohibit bfcache from working in browsers.

APIs that do not interfere with a checkmark and green color, those that will definitely prevent the page from getting into bfcache are marked in red. White fields are spaces that are not described anywhere.

Firefox


Here are some interesting details from specific browsers. I'll start with Firefox, he was the first to do it all.


Link from the slide

The most important thing that I learned from the sources of Firefox is that when working with bfcache, it can write to the text log on disk all the reasons why it cannot put the page in the cache.


Link from the slide

And I even managed to find out how to make it do it. There are two secret environment variables: in one we indicate what to log, in the second - in what file to write a log. After that, we launch the binary, and voila! We will see approximately that on the previous slide, lines of the form “such html cannot be cached for such a reason, such for a different reason”. We can read it right away, very convenient.



If you want to experiment once, you can open the about: networking page in Firefox. There you can enter the same fields in the "Journal" section. We can indicate in two fields what and where to log, and with the buttons start and stop the log.


Link from the slide

When does Firefox refuse to cache the page? If you have incomplete requests, including AJAX requests or requests for resources such as images, then he will refuse to put the page in bfcache. He believes that it is not completed, has not finished downloading, there is some kind of suspicious activity. But all this does not apply to favicon. If you forgot favicon, if it doesn’t load, he thinks - okay, he’ll do it, it’s normal for your site.

If you have a long-running script, the browser will ask: since it takes time, blocks the UI, maybe beat it? And if you agreed, then such a page is considered to be a bit wrong, and we do not cache it.

If you use IndexedDB ... This is an instructive story. Previously, in the first implementation, they looked: if you have IndexedDB and there is an incomplete transaction, then such a page is not cached, because it is not clear how to work with it (we are trying to hide it right in the middle of the transaction). But then they simply lost this piece of code during refactoring. And as you might imagine, they did not have tests for this. They even have a bug in the bugtracker. He is already two years old with something. People wrote: "My bfcache with IndexedDB does not work correctly, what should I do?" Firefox developers answered - it really breaks down, we just lost this piece of text during refactoring, okay, let it continue. Moral: write tests, even if you write Firefox, otherwise everything can end sadly.

And one more interesting factor of non-availability in bfcache - if mixed content is explicitly allowed. What it is?


Link from the slide

Suppose your page opens via HTTPS, but you still load some resources via HTTP, especially scripts. That is, you have a non-security script, it can be modified by anyone.


Link from the slide

By default, Firefox, like other browsers, doesn’t execute such a non-security script now. But if it’s very important for you, you climbed into the settings and allowed it to be executed, then, accordingly, it will not cache such a page. He will say - well, you told me not to execute the code, but then no, no!



Another tweak is the size of bfcache itself. Here, the default is minus one. That means how much memory Firefox has, so many pages it tries to cache. But we can forcibly disable the cache by putting a zero, or explicitly set a number - for example, remember no more than five pages.

Warning: the next slide contains sample code in the scary C ++ language, this can be dangerous at a front-end conference. Do not try to copy it, run it in the browser console. Your disk may be formatted, the screen may explode, or bitcoins may be mined. I warned you.


Link from the slide

So, the Gecko code. It can be opened, read, viewed for free on the Internet. And I rummaged. There is the most important method - CanSavePresentation (), it answers the question: is it possible to cache this document? That is, this is the ultimate source of truth about what is now implemented in Firefox. And yet - it was from there that I learned that you can read the log. There is such a variable - gPageCacheLog. This is the log in which everything is written. Here is such an interesting story about an excursion into C ++.

That is, you open the link, look at the code, search (there is a convenient and, moreover, quick search) and you can find out the actual implementation details in the latest version of the browser — those that are simply missing from the documentation.

Safari


The most cruel thing Safari does when a page hits bfcache: if you have pending AJAX requests, Safari just kills them.



Even if you overlaid them with error handlers and try to check everything, correct it - it seems as if the request did not exist at all. After recovering from bfcache, you will have no error, no “OK”, nothing at all.



The handlers of pageshow pagehide, as I said, in Safari are called only if they are written in a script that is inline into the page. From an external script, they may or may not work - how lucky. But I warned you.



Another interesting difference: beforeunload and unload handlers do not interfere with the page getting into bfcache. In this case, beforeunload is always called, and when it gets into the cache, and when not hit. But unload when the page hits the cache is not called. And here is another rake located: the page may go to the cache and never appear from it, and if you wrote some important code in unload, it may never be called, although not a single error occurred on the page. That is, it correctly went to the cache, and from it to nowhere, and your unload will never be called.



Another interesting point. As you know, you can open several windows through window.open (). And at the same time, these windows have links to each other. That is, they can simultaneously climb into each other's window, read / write different properties. This opening of child windows does not interfere with bfcache. And here, most likely, Safari is just as cruel as it is with AJAX requests, that is, everything is tearing hard, and goodbye. Apple developers love it harder.

Again a minute C ++! WebKit sources are also not secret, they can also be opened, read and studied.


Link from the slide

They are on GitHub, there I highlighted two important functions:

- canCacheFrame () - check whether this frame can be cached.
- In different objects on the page, such as an HTML element or font, there are canSuspendForDocumentSuspension () methods. This object is responsible for whether it can cache, freeze, or not.

And one more important aspect: WebKit has very convenient tests. There, right in the LayoutTests / fast / history folder in the form of small, compact, concise html, there are tests for the work of different APIs that are implemented in Safari with bfcache. This is a living example of how you can write code in Safari with these APIs and how they affect or not affect whether or not they allow bfcache hits. Very interesting to see.



From there I learned that Safari also writes all its knowledge about bfcache, all the features, to a text log. But, unfortunately, I never found out how to enable this logging or, if it is enabled, where to find this file on disk. If anyone knows, tell me, I will be very grateful.

Chromium.




As I already said, there work is still in progress, everything is closed under the flag. You can download the fresh Chrome Canary, go to the flags. The setting is hidden there - you can try to play with it. You might see something.



From the interesting - I already spoke about opening a page through window.open (). In Chromium, such pages are not cached so far. They didn’t figure out how to correctly resolve all this, and shamelessly cut it off, as in Safari, their conscience does not allow them.

If DOMContentLoaded does not occur, then the page will also not be cached.

There are many new APIs that start with “Web” - it’s also difficult and incomprehensible with them, and so far all of them turn off bfcache by default. That is, if something trendy, new is used on the page, such as WebGL, WebVR, WebUSB, WebBluetooth, etc., such a page will not get into bfcache.

Service Worker. Also, we don’t cache such a page yet, but we plan to process it correctly, hide it from the vigilant eyes of the Service Worker.

If geolocation is enabled, we also do not cache it yet. So simpler for now.

If during the time the page was in the cache, the cookies were rotten, we believe that some kind of authorization has expired. Perhaps it was online banking or something else. This means that the page is no longer valid - we clear it from the cache.


Link from the slide

The Google guys went even further. They suggested that we unify everything formalize, unify in all browsers, proposed a page lifecycle specification for all states, suggested adding new events to transitions between different states. You can look at the link that they thought up there.


Link from the slide

Sources. As you know, Chromium sources are also available. All this lies in a class called BackForwardCacheImpl - very good naming, almost did not have to look. The main method that checks if a document can be saved is called CanStoreDocument (). There is also a GetDisallowedFeatures () method, which simply lists all the new features and APIs that are not supported in bfcache. Very convenient: concentrated in one place, read and realized what is currently possible and what cannot be used.

Internet Explorer 11


An excursion into history for those who still have IE 11. For those who have everything bad.



There is bfcache there, and this is the main problem, because you have to deal with it. The documentation says that bfcache supposedly only works over HTTP. In fact, in production, it also works on HTTPS for some reason. Moral: if you are a developer, please pay attention to your documentation. Then you have to suffer because of her.

If there is a beforeunload handler, then it will prevent it from getting into the cache. They did not say anything about unload in the documentation - perhaps they did not know or forgot about it.

If the page has not finished loading, it is also not cached. If someone uses ActiveX components, we also don’t cache. And if DevTools is open there, too. This is an important point.



How without bugs? Added persisted field, but sometimes it doesn’t work. That is, the page really gets into the cache, returns from it, and persisted is not set to true. What to do?

We had beautiful code that determines whether we returned from the cache or reloaded from the server.



Now it had to be supplemented with a crutch for IE. We determine that we have IE, and in some workarounds we understand that the page was nevertheless extracted from the cache and at the same time we had a history navigation (back_forward).



Moreover, how do you know if a page is cached? We take her load time. If it loaded from the server in 50 milliseconds or faster, then this basically cannot be in IE - it means it is from the cache! :)



I already mentioned navigation through history. If we have the back_forward navigation type, then we went through the history, and if the page is from the cache, it means bfcache, there are no other options in IE.

What's next?


What to do next with this knowledge? I would not want you to go out and forget all this like a nightmare.

- Firstly, here is the most valuable thing that I came across and what I want to push you to: use open source browsers! In the open access to the Internet right now are the source code of all the leading browsers that our customers use. And this is the most relevant documentation on how and how it is supported, where and how it works. Including there are tests that are directly written in HTML and JS. Use, look!

- Secondly, consider in existing applications that they can run into bfcache. Tell your testers about this so that when they check the navigation, they know that when navigating through the history, the page can be opened both from the server and from bfcache. Here is a video of the real bug that we fixed when bfcache worked for us:

Open GIF

The user enters four requests, sees four issues. After that, it goes back, sees issue 3 and query 4. Another previous issue is 2 and query 3. That is, it has a mismatched state of the page - the contents of the search and search strings contradict each other. Keep this in mind in your applications.

- And thirdly, if you are writing new code, think about whether you need bfcache. If so, use the API compatibility table. If not, do not use, but if you accidentally get into bfcache, consider the features of Safari and other browsers that I mentioned. Some things can tear insolently, and you won’t be able to understand why this happens.

As promised, a link to the materials.

All Articles