How much marketing is at ACID?

Hello everyone. In touch Vladislav Rodin. Currently, I am the head of the High Load Architect course at OTUS, and I also teach courses on software architecture.

In addition to teaching, as you can see, I have been writing for a blog copyright material OTUS Habré and today's article I want to coincide with the start of the course "Databases" , which now is open the set.




Foreword


Transactions appeared in the 70s and were presented as a database tool to solve the problems of fault tolerance and data access in a competitive environment. Then a number of properties were formulated that a transaction must possess in order to fulfill the tasks assigned to it, and the capital letters of these properties, set in the right order, made up the beautiful acronym ACID.

The period of time during which these events took place was characterized by the absence of high loads, the Internet and performance problems, which could be solved only by vertical scaling methods. Subsequently, in the early 2000s, a trend appeared on NoSQL databases, the abbreviation BASE appeared, which was actually opposed to the classical ACID (ACID - acid, BASE - alkali). Now there is a reverse trend for ACID. Even NoSQL's MongoDB is now supporting ACID.

Let's look at what this abbreviation means and how much marketing it has.

ACID represents 4 properties:

A = atomicity
C = consistency
I = isolation
D = durability

Now let's talk about each property separately.

A = atomicity


Atomicity is an overloaded term and in the context of transactions in a database can be formulated as the principle of "all or nothing." If your transaction contains 10 insert operations, then all 10 will be executed (commit transactions will be carried out), or none of them (rollback transactions will be carried out).

How is this property provided? The fact is that when a transaction containing the same 10 inserts comes to the database, the data does not begin to change. Before a transaction, the transaction is written directly to the data in a log, in which the changes made by it are recorded. This journal can also be used to replicate data, or it can be completely unrelated to it, as I talked about, for example, here. Thanks to this journal, transactions can be committed directly to the data, or, in which case, rollback it. The rule that a transaction should be closed as soon as possible follows directly from an understanding of the principles for implementing this property: databases are often forbidden to clean this log while a transaction is open, so it can become clogged, which, in turn, leads to very unpleasant consequences.

C = consistency (consistency or integrity)


In terms of ACID, consistency does not mean the same as in terms of the CAP theorem (in the theory of distributed systems, there are many degrees of this consistency). By consistency we mean the following: some predefined invariants of the system must be executed both before and after the commit transaction. Examples of system invariants can be represented: the debit converges with the loan, the total salary of employees does not exceed the budget, the number of employees in the company is equal to the number of previously opened vacancies, etc. In the database language, this means only the fulfillment of all constraints.

The question about the need for consistency at the database level is quite controversial, because the presence of constraints can indicate that part of the business logic has moved from the application to the database level, which is not universally recognized as good practice. In the end, the application itself may decide not to commit invalid data. There is an opinion that consistency was added only to make the abbreviation beautiful (marketing).

I = isolation


Isolation is a database property that allows parallel transactions to be executed as sequential. After all, no one forbids the database to perform several transactions at the same time, it is necessary to make sure that they do not affect each other, so that anomalies of the race condition type do not occur. In fact, it is isolation that solves the problem of data access in a competitive environment.

The disclosure of the concept of isolation because of its volume deserves a separate article., because it is precisely isolation that can be called the heart of ACID. The price of transactionality often comes down to the price of ensuring isolation, which is why isolation has different levels, each of which provides its own level of protection against race conditions and carries one or another overhead. Isolation is fully ensured only at the serializable level, which is very difficult to implement and which is rarely needed.

D = durability


Durability tells us that if a transaction has been applied, then by no means should it disappear. In fact, this means the following: if the database responded that a transaction was committed, the transaction was committed to non-volatile memory. This means that the fsync system call occurred, that is, the buffers were flushed to the hard drive, and it answered ok.

Durability is also a marketing term, because it cannot be fully provided. Even if we discard the artificial scenarios of “burning the Earth by aliens”, after which there will be no databases or transactions at all, more likely but extreme scenarios of physical destruction of a particular hard drive on which the transaction was recorded, we can recall that the fsync system call guarantees hitting a hard drive in the controller, which, in turn, still contains a volatile buffer. The time spent in it is short, but not equal to 0. As a result, if you turn off the electricity exactly at the “right” moment, the transaction may still be lost!

Although the problem with fsync has been solved in an expensive Oracle database, nothing can protect us from the problems associated with the physical destruction of the machine. We can only increase guarantees using backups and replication .

findings


Despite the fact that ACID provides quite interesting properties, such as atomicity and isolation, some of these properties are just marketing, and even known for its rigor compared to BASE, ACID does not fully ensure the absence of transaction loss opportunities, as well as the impact of results performing simultaneous transactions on each other (isolation must still be configured!).


We invite everyone to a free lesson on the topic: "A model for working with data in PostgreSQL . "



All Articles