How to fix route leaks

It is worth mentioning that the following story is largely unique.

And this is how it started. Within about one hour, starting at 19.28 UTC yesterday, April 1, 2020, the largest Russian Internet service provider - Rostelecom (AS12389) - began to announce network prefixes of the largest Internet players: Akamai, Cloudflare, Hetzner, Digital Ocean, Amazon AWS and other famous names. Until the problem was resolved, the paths between the largest cloud providers on the planet were broken - the Internet "blinked".

This route leak was quite successfully distributed through the Rascom provider (AS20764), from where it came through Cogent (AS174) and, after a few more minutes, through Level3(AS3356) has spread all over the world. The leak was so serious that almost all Tier-1 operators were affected by the anomaly.

It looked like this:

image

On top of this:

image

This route leak touched 8870 network prefixes owned by nearly 200 autonomous systems. With a lot of incorrect announcements - none of which were discarded by the participants receiving them. Ultimately, the presence of filters would not change the fact of route leakage, but would make its distribution somewhat less. In order to assess the dynamics of what happened, you can look at the BGPlay example for one of the Akamai prefixes: https://stat.ripe.net/widget/bgplay#w.resource=2.17.123.0/24

As we wrote yesterday, all network engineers at the moment should be fully aware of the correctness of their own actions, excluding the possibility of a critical error. The mistake made by Rostelecom perfectly illustrates how fragile the standardized IETF BGP routing is and, especially, in such stressful times in terms of traffic growth as now.

But what really distinguishes this situation from any other is that Rostelecom received a warning from the real-time monitoring system Qrator.Radar, quickly contacting us for help in correcting the consequences.

Given the triviality of errors in BGP, it is extremely easy to make one during the current coronavirus pandemic. But with the availability of analytical data, you can quickly respond to a changing situation, which was done by putting an end to the leak and restoring normal routing.

We seriously recommend that all ISPs other than Rostelecom think about monitoring BGP announcements right now to prevent large-scale incidents in the bud. And of course, RPKI Origin Validation is not fiction - it is what you need to do now.

All Articles