👳🏿 🙋 🌁 Orchestrator for MySQL: why a fail-safe project cannot be built without it 🚄 🔰 🆒

Any major project started with a pair of servers. First, there was one DB server, then slaves were added to it to scale the reading. And here - stop! One master, but many slaves; if one of the slaves leaves, then everything will be fine, but if the master leaves, it will be bad: downtime, admins in the soap raise the server. What to do? Reserve the master. My colleague Pavel has already written an article about this , I will not repeat it. Instead, I'll tell you why you definitely need an Orchestrator for MySQL!

Let's start with the main question: “How will we switch the code to a new machine when the wizard leaves?”

The scheme with VIP (Virtual IP) I like the most, we’ll talk about it below. It is the simplest and most obvious, although it has an obvious limitation: the master, which we will reserve, must be in the L2 segment with a new machine, that is, you can forget about the second DC. And, in a good way, if you follow the rule that large L2 is evil, because L2 is only per rack, and between the racks L3, and such a scheme has even more restrictions.
You can register a DNS name in the code and resolve it through / etc / hosts. In fact, there will be no resolution. Advantage of the scheme: there is no restriction characteristic of the first method, that is, it is possible to organize cross-DCs as well. But then the obvious question arises, how quickly through Puppet-Ansible will we deliver the change to / etc / hosts.
You can change the second method a bit: on all web servers we put caching DNS, through which the code will go to the master database. You can register TTL 60 for this DNS entry. It seems that with proper implementation, the method is good.
Schema with service discovery involving Consul and etcd.
An interesting option with ProxySQL . It is necessary to wrap all traffic to MySQL through ProxySQL, ProxySQL can determine who the master is now. By the way, about one of the options for using this product can be found in my article .

The author of Orchestrator, while working on Github, first implemented the first scheme with VIP, and then redesigned it with c consul.

Typical infrastructure scheme:

I will immediately describe the obvious situations to consider:

VIP- . : , , Orchestrator failover ; , VIP . .
. ifdown, — ifup vip. , failover , splitbrain.
, Orchestrator , VIP / , VIP, arping , VIP .
All slaves should have read_only = 1, and as soon as you promote the slave to the master, it should have read_only = 0.
Do not forget that any slave that we have chosen for this can become a master (Orchestrator has a whole mechanism of preference, which slave should be considered as a candidate for a new master in the first place, which one in the second, and which slave should not be selected at all under any circumstances master). If the slave becomes a master, then the slave load will remain on it and the master load will be added, this must be taken into account.

Why do you absolutely need Orchestrator if you don’t have one?

Orchestrator has a very user-friendly graphical interface that displays the entire topology (see screenshot below).
Orchestrator can track which slaves are behind, and where replication is generally broken (we have scripts for sending SMS to Orchestrator).
Orchestrator tells you which slides have a GTID errant error.

Orchestrator Interface:

What is a GTID errant?

There are two basic requirements for Orchestrator to work:

It is necessary that pseudo GTID is enabled on all machines in the MySQL cluster, we have GTID enabled.
It is necessary that everywhere there is one type of binlogs, you can statement. We had such a configuration in which Row was on the master and on most of the slaves, and the Mixed mode remained historically on two. As a result, Orchestrator simply did not want to connect these slaves to the new master.

Remember that the most important thing in a production slave is its consistency with the master! If you have the Global Transaction ID (GTID) enabled on both the master and the slave, you can use the gtid_subset function to find out if the same data change requests are actually executed on these machines. Read more about this here .

Thus, Orchestrator shows you through a GTID errant error that there are transactions on the slave that are not on the wizard. Why it happens?

Read_only = 1 is not enabled on the slave, someone connected and executed a request for data change.
Super_read_only = 1 is not enabled on the slave, then the administrator, having mixed up the server, went in and executed a request there.
If you took into account both of the previous paragraphs, there is one more trick: in MySQL, the request for flush binlogs also falls into the binlog, so when you flush the first time, a GTID errant will appear on the wizard and on all slaves. How to avoid this? In perona-5.7.25-28, the binlog_skip_flush_commands = 1 setting appeared, which forbids writing flush to binlogs. There is a bug in mysql.com .

I summarize all of the above. If you do not want to use Orchestrator in failover mode yet, put it in surveillance mode. Then you will always have before your eyes a map of the interaction of MySQL machines and clear information about what type of replication on each machine, whether the slaves are behind, and most importantly - how much consistency they have with the master!

The obvious question is: “How should Orchestrator work?” He must select a new master from the current slaves, and then reconnect all the slaves to it (this is what GTID is for; if you use the old mechanism with binlog_name and binlog_pos, then switching the slave from the current master to the new one is simply impossible!). Before Orchestrator came to us, I once had to do all this manually. The old master crashed due to a buggy Adaptec controller, it had about 10 slaves. I needed to transfer the VIP from the master to one of the slaves and reconnect all the other slaves to it. How many consoles I had to open, how many simultaneous commands to enter ... I had to wait until 3 a.m., remove the load from all the slaves, except two, make the first car out of two masters, immediately hook up the second car to it,therefore, pick up all the other slaves to the new master and return the load. In general, the horror ...

How does Orchestrator work when it enters failover mode? This is most easily shown by the example of a situation where we want to make a master a more powerful, more modern machine than now.

The figure shows the middle of the process. What has already been done up to this point? We said that we want to make some kind of slave a new master, Orchestrator just started reconnecting all the other slaves to it, and the new master acts as a transit machine. With this scheme, no errors occur, all the slaves work, Orchestrator removes the VIP from the old master, transfers it to the new one, makes read_only = 0 and forgets about the old master. All! The downtime of our service is the VIP transfer time, it is 2-3 seconds.

That's all for today, thank you all. Soon there will be a second article about Orchestrator. In the famous Soviet film “The Garage,” one hero said, “I wouldn’t go into intelligence with him!” So, Orchestrator, I’d go with you to scout!

Orchestrator for MySQL: why a fail-safe project cannot be built without it

More articles: