📼 😄 🛀🏻 Consul + iptables =: 3 ✏️ 🙌🏽 👷🏼

In 2010, Wargaming had 50 servers and a simple network model: backend, frontend, and firewall. The number of servers grew, the model became more complicated: staging, isolated VLANs with ACLs, then VPNs with VRF, VLANs with ACLs to L2, VRFs from ACLs to L3. Head is spinning? More will be more fun.

When servers began to work 16,000 without tears with so many heterogeneous segments, it became impossible. Therefore, they came up with a different solution. We took the Netfilter stack, added Consul to it as a data source, and we got a fast distributed firewall. They replaced the ACL on the routers and used as an external and internal firewall. For dynamic tool management, we developed the BEFW system, which was used everywhere: from controlling user access to the grocery network to isolating network segments from each other.

How it all works and why you should take a closer look at this system, tells Ivan Agarkov (annmuor) - the head of the infrastructure security group of the Maintenance unit at the Minsk company development center. Ivan is a SELinux fan, loves Perl, writes code. As the head of the IB group, he regularly works with logs, backups and R&D to protect Wargaming from hackers and ensure the operation of all game servers in the company.

History reference

Before telling how we did it, I’ll tell you how we came to this and why it was needed. To do this, we transfer 9 years ago: 2010, only the World of Tanks appeared. Wargaming had approximately 50 servers.

Growth graph of company servers.

We had a network model. For that time it was optimal.

The network model in 2010.

At the front end there are bad guys who want to break us, but there is a firewall in it. There is no firewall on the backend, but there are 50 servers there, we all know them. Everything works well.

Over 4 years, the server fleet has grown 100 times, up to 5000. The first isolated networks appeared - stageing: they can’t go into production, and often things that could be dangerous were spinning there.

Network model in 2014.

By inertia, all the same pieces of iron were used, and all work was done on isolated VLANs: ACLs are written to the VLAN that allow or prohibit any connection.

In 2016, the number of servers reached 8000. Wargaming absorbed other studios, additional affiliate networks appeared. They seem to be ours, but not quite: VLAN often does not work for partners, you have to use a VPN with VRF, isolation becomes more complicated. A mixture of ACL isolates grew.

The network model in 2016.

By the beginning of 2018, the fleet of cars grew to 16,000. There were 6 segments, and the rest we did not count, including closed ones, in which financial data were stored. There are container networks (Kubernetes), DevOps, cloud networks connected via VPN, for example, from the IVS. There were a lot of rules - it hurt.

Network model and isolation methods in 2018.

For isolation we used: VLAN with ACL on L2, VRF with ACL on L3, VPN and much more. Too much.

Problems

Everyone lives with ACLs and VLANs. What is generally wrong? Harold, hiding the pain, will answer this question.

There were many problems, but there were five massive ones.

Geometric price increases for the new rules . Each new rule was added longer than the previous one, because you had to first see if there was already such a rule.
There is no firewall inside the segments . Segments were somehow separated from each other, inside there are already not enough resources.
The rules have been applied for a long time. Hands one local rule operators could write in an hour. Global took several days.
Difficulties with the audit of the rules . More precisely, it was not possible. The first rules were written back in 2010, and most of their authors no longer worked for the company.
Low level of control over infrastructure . This is the main problem - we did not know well what was going on with us.

This is what the network engineer looked like in 2018 when he heard: “We need some more ACLs.”

Solutions

At the beginning of 2018, it was decided to do something about it.

The price of integration is constantly growing. The starting point was that large data centers no longer supported isolated VLANs and ACLs, because the memory on the devices ran out.

Solution: removed the human factor and automated the provision of access to the maximum.

New rules apply for a long time. Solution: speed up the application of the rules, make it distributed and parallel. To do this, you need a distributed system so that the rules are delivered on their own, without rsync or SFTP per thousand systems.

The lack of a firewall inside the segments.The firewall inside the segments began to fly to us when different services appeared within the same network. Solution: use a host-based firewall. Almost everywhere we have Linux, and iptables are everywhere, this is not a problem.

Difficulties with the audit of the rules. Solution: keep all the rules in a single place for review and management, so we can audit everything.

Low level of control over infrastructure. Solution: take an inventory of all services and accesses between them.

This is more of an administrative process than a technical one. Sometimes we have 200-300 new releases per week, especially during promotions and on holidays. However, this is only for one team of our DevOps. With so many releases, it's impossible to see what kind of ports, IP, integration are needed. Therefore, we needed specially trained service managers who interviewed the teams: “What is there and why did you raise it?”

After all that we launched, the network engineer in 2019 began to look like this.

Consul

We decided that we would put everything that we found with the help of service managers in Consul and from there we would write iptables rules.

How did we decide to do this?

We collect all the services, networks and users.
Let's make iptables rules based on them.
Automate control.
...
PROFIT.

Consul is not a remote API; it can work on every node and write to iptables. It remains only to come up with automatic controls that will clean out the excess, and most of the problems will be solved! We will finalize the rest in the process.

Why consul?

Well established. In 2014-15, we used it as a backend for Vault, in which we store passwords.

Does not lose data . During use, Consul did not lose data in any accident. This is a huge plus for the firewall management system.

P2P communications accelerate the spread of change . With P2P, all changes come quickly, no need to wait for hours.

Convenient REST API. We also considered Apache ZooKeeper, but it does not have a REST API, you will have to put crutches.

It works as a keystore (KV), and as a directory (Service Discovery) . You can immediately store services, catalogs, data centers. This is convenient not only for us, but also for neighboring teams, because when building a global service, we think big.

Written in Go, which is part of the Wargaming stack. We love this language, we have many Go developers.

Powerful ACL system. In Consul, you can use ACLs to manage who and what to write. We guarantee that the rules of the firewall will not overlap with anything else and we will not have problems with this.

But Consul has its drawbacks.

It does not scale within the data center, if you do not have a business version. It is scaled only by the federation.
Very dependent on network quality and server load. Consul will not function normally as a server on a busy server if there are any lags in the network, for example, uneven speed. This is due to P2P connections and update distribution models.
Difficulties with accessibility monitoring . In the status of Consul can say that everything is fine, but he has long died.

We solved most of these problems during the operation of Consul, so we chose it. The company has plans for an alternative backend, but we have learned how to deal with problems and are still living with Consul.

How Consul Works

In the conditional data center, we install servers - from three to five. One or two servers will not work: they will not be able to organize a quorum and decide who is right, who is wrong, when the data does not match. More than five makes no sense, performance will drop.

Clients connect in any order to the servers: the same agents, only with a flag server = false.

After that, clients receive a list of P2P connections and build connections between themselves.

At the global level, we are interconnecting several data centers. They also connect P2P and communicate.

When we want to collect data from another data center, the request goes from server to server. Such a scheme is called the Serf protocol . The Serf protocol, like Consul, is developed by HashiCorp.

A Few Important Facts About Consul

Consul has documentation describing his work. I will give only selected facts that are worth knowing.

Consul servers select masters from among the voters . Consul selects the wizard from the list of servers for each data center, and all requests go only to it, regardless of the number of servers. Hanging the wizard does not lead to re-election. If the wizard is not selected, the requests are not served by anyone.

Do you want horizontal scaling? Sorry, but no.

A request to another data center goes from master to master, regardless of which server it came to. The selected master receives 100% of the load, except for the load on the forward requests. All data center servers have an up-to-date copy of the data, but only one answers.

The only way to scale is to enable stale mode on the client.

In stale mode, you can respond without quorum. This is a mode in which we refuse data consistency, but we read a little faster than usual and any server responds. Naturally, recording is only through the master.

Consul does not copy data between data centers . When collecting federation, each server will have only its own data. For others, he always turns to someone else.

The atomicity of operations is not guaranteed outside the transaction . Remember that not only you can change something. If you want it differently, conduct a transaction with a lock.

Blocking operations do not guarantee blocking . The request goes from master to master, and not directly, so there is no guarantee that the lock will work when you lock, for example, in another data center.

ACLs do not guarantee access either (in many cases) . The ACL may not work because it is stored in one data center of the federation - in the ACL data center (Primary DC). If the DC does not answer you, the ACL will not work.

One hovering wizard will freeze the entire federation . For example, in the federation there are 10 data centers, and in one there is a bad network, and one master falls. Everyone who communicates with him is stuck in a circle: a request is being made, there is no answer to it, the thread hangs. It will not be possible to find out when this will happen, just in an hour or two the whole federation will fall. You cannot do anything about it.

Status, quorum and elections are processed in a separate thread. Re-selection will not happen, the status will not show anything. You think that you have a live Consul, you ask, and nothing happens - there is no answer. Moreover, the status shows that everything is fine.

We faced this problem, we had to rebuild specific parts of data centers in order to avoid it.

The business version of Consul Enterprise does not have some of the drawbacks above . It has many useful functions: voting, distribution, scaling. There is only one “but” - a licensing system for a distributed system is very expensive.

Life hack: rm -rf /var/lib/consul- a cure for all diseases of the agent. If something doesn’t work for you, simply delete your data and download the data from the copy. Most likely, Consul will work.

Befw

Now let's talk about what we added to Consul.

BEFW - an acronym for Bed and ack E nd the F the ire of the W all. It was necessary to somehow name the product when I created the repository in order to put the first test commits into it. This name remains.

Rules Templates

Rules are written in iptables syntax.

-N BEFW
-P INPUT DROP
-A INPUT -m state — state RELATED, ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -j BEFW

We all go to BEFW chain, except ESTABLISHED, RELATEDand localhost. The template can be anything, this is just an example.

What is BEFW useful for?

Services

We have a service, it always has a port, the node on which it works. From our node, we can locally ask the agent and find out that we have some kind of service. You can also put tags.

Any service that is running and registered with Consul turns into an iptables rule. We have SSH - open port 22. The bash script is simple: curl and iptables, nothing else is needed.

Customers

How to open access not to everyone, but selectively? By the name of the service, add IP lists to the KV repository.

For example, we want everyone from the tenth network to be able to access the SSH_TCP_22 service. Add one small TTL field? and now we have temporary permissions, for example, for a day.

Accesses

We connect services and customers: we have a service, for each KV-storage is ready. Now we give access not to everyone, but selectively.

Groups

If every time we write thousands of IPs for accesses, we will get tired. Let's come up with groupings - a separate subset in KV. We will call it Alias (or groups) and we will store groups there according to the same principle.

Connect: now we can open SSH not specifically on P2P, but on a whole group or several groups. Similarly, there is TTL - you can add to the group and remove from the group temporarily.

Integration

Our problem is the human factor and automation. So far we have solved it like that.

We work with Puppet, and transfer everything related to the system (application code) to it. Puppetdb (regular PostgreSQL) stores a list of services that are running there, you can find them by resource type. There you can find who goes where. We also have a pull request and merge request system for this.

We wrote befw-sync, the simplest solution that helps transfer data. First, sync cookies go to puppetdb. The HTTP API is configured there: we ask what services we have, what needs to be done. Then they make a request to Consul.

Is there integration? Yes: they wrote the rules, allowed to accept the Pull Request. Need some port or add a host to some group? Pull Request, review - no more "Find 200 other ACLs and try to do something about it."

Optimization

Localhost ping with an empty rule chain takes 0.075 ms.

Add 10,000 iptables to this chain. As a result, ping will increase by 5 times: iptables is completely linear, processing each address takes some time.

For the firewall, into which we migrate thousands of ACLs, we have many rules, and this introduces a delay. For gaming protocols, this is bad.

But if we put 10,000 addresses in ipset ping will even decrease.

The point is that the “O” (algorithm complexity) for ipset is always 1, no matter how many rules there are. True, there is a limitation - there cannot be more than 65535 rules. For now, we live with this: you can combine them, expand them, make two ipsets in one.

Storage

The logical continuation of the iteration process is the storage of customer information for the service in ipset.

Now we have the same SSH, and we do not immediately write 100 IP, but we set the name ipset to communicate with, and the following rule DROP. You can redo the rule “Who is not here, that is DROP”, but more clearly.

Now we have rules and sets. The main task is to make a set before writing the rule, because otherwise iptables will not write the rule.

General scheme

In the form of a diagram, everything that I said looks like this.

Commit to Puppet, everything is sent to the host, services are here, ipset is there, and whoever is not registered there is not allowed.

Allow & deny

To quickly save the world or quickly turn off someone, at the beginning of all the chains we made two ipset: rules_allowand rules_deny. How it works?

For example, someone with bots creates a load on our Web. Previously, it was necessary to find its IP by the logs, refer it to network engineers so that they could find the source of the traffic and ban it. Now it looks different.

We ship to Consul, wait 2.5 s, and you're done. Since Consul quickly distributes through P2P, it works everywhere, anywhere in the world.

Once I somehow completely stopped WOT, making a mistake with the firewall. rules_allow- This is our insurance against such cases. If we made a mistake with the firewall somewhere, something is blocked somewhere, we can always send a conditional 0.0/0to quickly raise everything. Then we will fix everything with our hands.

Other sets

You can add any other sets in space $IPSETS$ .

What for? Sometimes someone needs an ipset, for example, to emulate the disconnection of some part of the cluster. Everyone can bring any sets, call them and they will be taken from Consul. At the same time, the sets can both participate in the iptables rules and be like a team NOOP: consistency will be supported by the daemon.

Users

It used to be like this: a user connected to a network and received parameters through a domain. Until the next generation of firewalls, Cisco was not able to understand where the user is and where is the IP. Therefore, access was granted only through hostname-machines.

What have we done? Wedged in at the time of receiving the address. Usually it is dot1x, Wi-Fi or VPN - everything goes through RADIUS. For each user, create a group by user name and put an IP with TTL in it, which is equal to its dhcp.lease - as soon as it expires, the rule will disappear.

Now we can open access to services, as well as to other groups, by username. We got rid of the pain with the hostname when they change, and took the load off the network engineers because they no longer needed Cisco. Now, engineers themselves prescribe access to their servers.

Insulation

In parallel, we began to disassemble the insulation. Service managers took an inventory, and we analyzed all our networks. We decompose them into the same groups, and on the necessary servers the groups were added, for example, to deny. Now the same staging isolation gets into rules_deny in production, but not in production itself.

The scheme works quickly and simply: remove all ACLs from servers, unload hardware, reduce the number of isolated VLANs.

Integrity control

Previously, a special trigger worked for us, which informed when someone changed the firewall rule with their hands. I wrote a huge firewall rule checker, it was difficult. Now integrity controls BEFW. He zealously makes sure that the rules he makes do not change. If someone changes the rules of the firewall, he will return everything back. “I quickly raised a proxy here to work from home” - there are no such options anymore.

BEFW controls the ipset from the services and list in befw.conf, the service rules in the BEFW chain. But does not follow other chains and rules and other ipset.

Accident protection

BEFW always saves the last successful state directly in the binary structure of state.bin. If something went wrong, it always rolls back to this state.bin.

This is Consul's unstable insurance when he did not send data or someone made a mistake and used rules that could not be applied. So that we are not left without a firewall, BEFW will roll back to the last state if an error occurs at some stage.

In critical situations, this is a guarantee that we will remain with a working firewall. We open all gray networks in the hope that the admin will come and fix it. Someday I will take it out in configs, but now we just have three gray networks: 10/8, 172/12 and 192.168 / 16. As part of our Consul, this is an important feature that helps to develop further.

: - BEFW. . GitHub.

I’ll tell you about the bugs we encountered.

ipset add set 0.0.0.0/0. What happens if you add to ipset 0.0.0.0/0? Will all IPs be added? Will Internet access be opened?

No, we get a bug that cost us two hours of downtime. Moreover, the bug has not been working since 2016, it lies in RedHat Bugzilla under the number # 1297092, but we found it by accident - from the developer's report.

Now BEFW has a strict rule, which 0.0.0.0/0turns into two addresses: 0.0.0.0/1and 128.0.0.0/1.

ipset restore set <file. What does ipset do when you tell it restore? Do you think it works just like iptables? Recover data?

Nothing of the kind - he does a merge, and the old addresses do not disappear, you do not close access.

We found a bug when we tested isolation. Now there is a rather complicated system - instead of being restorecarried out create temp, then restore flush tempand restore temp. At the end of swap: for atomicity, because if you first carry out flushand at that moment some package arrives, it will be discarded and something will go wrong. Therefore, there is a little black magic.

consul kv get -datacenter = other. As I said, we think that we are requesting some data, but we will get either data or an error. We can do this through Consul locally, but in this case both one and the other will hang.

The local Consul client is a wrapper over the HTTP API. But it just hangs and does not answer either Ctrl + C, or Ctrl + Z, no matter what, onlykill -9in the adjacent console. We came across this when we were building a large cluster. But we still have no solution, we are preparing to correct this error in Consul.

Consul leader is not responding. Our master in the data center does not respond, we think: “Probably, the re-selection algorithm will work now?”

No, it will not work, and monitoring will not show anything: Consul will say that there is a commitment index, a leader has been found, everything is fine.

How are we fighting this? service consul restartin cron every hour. If you have 50 servers - no big deal. When there will be 16,000, you will understand how it works.

Conclusion

As a result, we got the following advantages:

100% coverage of all Linux machines.
Speed.
Automation
Freed iron and network engineers from slavery.
There are integration opportunities that are almost endless: even with Kubernetes, even with Ansible, even with Python.

Cons : Consul, with whom we now live, and a very high price of error. As an example, once at 6 pm (prime time in Russia), I ruled something on the lists of networks. We were building insulation at BEFW just then. I was mistaken somewhere, it seems, I indicated the wrong mask, but everything fell in two seconds. Monitoring lights up, the duty officer comes in: “Everything lies with us!” The head of the department turned gray when he explained to the business why this happened.

The price of the error is so high that we came up with our own complicated prophylaxis procedure. If you will implement it on a large production, you do not need to give a master token over Consul to everyone. It will end badly.

Cost.I wrote the code for 400 hours alone. To support my team of 4 people spends 10 hours a month at all. Compared to the price of any new generation firewall, it's free.

Plans. The long-term plan is the search for alternative transport in return for or in addition to Consul. Perhaps it will be Kafka or something like that. But in the coming years we will live on Consul.

Immediate plans: integration with Fail2ban, with monitoring, with nftables, possibly with other distributions, metrics, advanced monitoring, optimization. Kubernetes support is also somewhere in the plans, because right now we have several clusters and desire.

Another of the plans:

search for traffic anomalies;
network map management;
Kubernetes support;
assembly of packages for all systems;
Web UI

We are constantly working on expanding the configuration, increasing metrics and optimizing.

Join the project. The project turned out to be cool, but, unfortunately, this is still a project of one person. Come to GitHub and try to do something: commit, test, offer something, give your assessment.

In the meantime, we are preparing for Saint HighLoad ++ , which will be held on April 6 and 7 in St. Petersburg, and we invite developers of high-loaded systems to apply for a report . Experienced speakers already know what to do, and we recommend that newcomers to speeches at least try . Participating in the conference as a speaker has several advantages. Which, you can read, for example, at the end of this article .

Consul + iptables =: 3