🍠 👨🏽‍🤝‍👨🏼 ⏭️ Data compression in Apache Ignite. Sberbank Experience 🍼 📻 🚴

When working with large volumes of data, sometimes the problem of insufficient disk space can become acute. One way to solve this problem is compression, due to which, on the same equipment, you can afford to increase storage volumes. In this article, we will look at how data compression works in Apache Ignite. This article will describe only the methods of disk compression implemented within the product. Other methods of data compression (over the network, in memory), both implemented and not, will remain outside the scope.

So, when persistence mode is on, as a result of changing data in caches, Ignite starts writing to disk:

Cache Content
Write Ahead Log (hereinafter referred to as WAL)

A mechanism called WAL compaction has existed for a long time to compress WALs. The recently released Apache Ignite 2.8 introduced two more mechanisms for compressing data on disk: disk page compression for compressing the contents of caches and WAL page snapshot compression for compressing some WAL records. More on all of these three mechanisms below.

Disk page compression

How it works

To begin with, we will very briefly dwell on how Ignite stores data. For storage, page memory is used. The page size is set at the start of the node and cannot be changed at later stages, also the page size must be a power of two and a multiple of the size of the file system block. Pages are loaded into RAM from disk as needed, the size of data on disk may exceed the amount of allocated RAM. If there is not enough space in RAM to load pages from disk, old, unused pages will be forced out of RAM.

Data is stored on disk in the following form: a separate file is created for each partition of each cache group, in this file, in ascending order of index, pages go one after another. The full page identifier contains the cache group identifier, partition number, and page index in the file. Thus, by the full page identifier, we can uniquely identify the file and offset in the file for each page. You can read more about page memory in an article on the Apache Ignite Wiki: Ignite Persistent Store - under the hood .

The disk page compression mechanism, as the name suggests, works at the page level. When this mechanism is turned on, work with data in RAM is performed as is, without any compression, but at the time of saving pages from RAM to disk, they are compressed.

But to compress each page individually is not a solution to the problem, you need to somehow reduce the size of the resulting data files. If the page size ceases to be fixed, we can no longer write pages to a file one by one, as this can give rise to a number of problems:

We cannot use the page index to calculate the offset at which it is located in the file.
, , . , . , .
, , , .

In order not to solve these problems at its own level, disk page compression in Apache Ignite uses a file system mechanism called sparse files. A sparse file is a file in which some regions filled with zeros can be marked as holes. In this case, blocks of the file system for storing these holes will not be allocated, as a result of which disk space is saved.

It is logical that in order to free the file system block, the hole size must be greater than or equal to the file system block, which imposes an additional restriction on the page size of Apache Ignite: for compression to give at least some effect, the page size must be strictly larger than the file system block size . If the page size is equal to the size of the block, then we will never be able to free a single block, since in order to free a single block we need a compressed page to occupy 0 bytes. If the page size is equal to the size of 2 or 4 blocks, we can already free at least one block if our page is compressed to at least 50% or 75%, respectively.

Thus, the final description of the mechanism: When writing a page to disk, an attempt is made to compress the page. If the size of the compressed page allows one or more blocks of the file system to be freed, then the page is written in compressed form, a “hole” breaks in place of the released blocks (a system call fallocate()with the “punch hole” flag is made). If the size of the compressed page does not allow freeing up blocks, the page is saved as is, in uncompressed form. All page offsets are considered as well as without compression, by multiplying the page index by page size. No self-relocation of pages is required. Page offsets, as well as without compression, fall on the boundaries of file system blocks.

In the current implementation, Ignite can only work with sparse files under Linux OS, so disk page compression can only be enabled when Ignite is used on this operating system.

Compression algorithms that can be used for disk page compression: ZSTD, LZ4, Snappy. In addition, there is an operating mode (SKIP_GARBAGE), in which only an unused place in the page is thrown out without applying compression to the remaining data, which allows to reduce the load on the CPU compared to the algorithms listed above.

Performance impact

Unfortunately, I did not actually measure performance on real stands, since we don’t plan to use this mechanism in production, but theoretically we can speculate where we will lose and where we will win.

To do this, we need to remember how to read and write pages when accessing them:

When a read operation is performed, it is first searched for in RAM, if the search fails, the page is loaded into RAM from the disk with the same stream that reads.
When performing a write operation, the page in RAM is marked as dirty, while the physical saving of the page to disk immediately in the stream performing recording does not occur. All dirty pages are saved to disk later in the checkpoint process in separate streams.

Thus, the effect on read operations:

(disk IO), .
(CPU), sparse . IO sparse ( sparse , , ).
(CPU), .
.
( ):
(disk IO), .
(CPU, disk IO), sparse .
(CPU), .

Which scale will outweigh? It all depends a lot on the environment, but I am inclined to believe that disk page compression is more likely to degrade performance on most systems. Moreover, tests on other DBMSs using a similar approach with sparse files show a drop in performance when compression is enabled.

How to enable and configure

As mentioned above, the minimum version of Apache Ignite that supports disk page compression: 2.8 and only supports the Linux operating system. Switching on and setting is performed as follows:

The class-path must have an ignite-compression module. By default, it is located in the Apache Ignite distribution in the libs / optional directory and is not included in the class-path. You can simply move the directory one level up to libs and then when launched through ignite.sh it will automatically be turned on.
Persistence ( DataRegionConfiguration.setPersistenceEnabled(true)).
( DataStorageConfiguration.setPageSize() ).
, () ( CacheConfiguration.setDiskPageCompression() , CacheConfiguration.setDiskPageCompressionLevel()).

WAL compaction

What is WAL and why is it needed? Very briefly: this is a journal in which all events that change as a result of the page repository fall. He is needed primarily for the possibility of recovery in the event of a fall. Before transferring control to a user, any operation must first write the event to the WAL, so that in the event of a fall it will be able to play through the log and restore all operations for which the user received a successful response, even if these operations did not have time to be reflected in the page storage on disk (above already it has been described that the actual writing to the page store is performed in a process called a checkpoint with some delay in separate threads).

Entries in the WAL are divided into logical and physical. Logical ones are keys and values themselves. Physical - reflect page changes in the page store. If logical records can be useful for some other cases, physical records are needed only for recovery in the event of a fall and records are needed only from the moment of the last successful checkpoint. Here we will not go into details and explain why this works in this way, but anyone interested can refer to the already mentioned article on the Apache Ignite Wiki: Ignite Persistent Store - under the hood .

One logical record often accounts for several physical records. That is, for example, one cache put operation affects several pages in page memory (a page with the data itself, pages with indexes, pages with free-lists). On some synthetic tests, it turned out that the physical records occupied up to 90% of the WAL file. Moreover, they need a very short time (by default, the interval between checkpoints is 3 minutes). It would be logical to get rid of this data after losing its relevance. This is exactly what the WAL compaction mechanism performs, gets rid of physical records and compresses the remaining logical records with zip, while the file size decreases very significantly (sometimes tens of times).

Physically, a WAL consists of several segments (default 10) of a fixed size (default 64 MB), which are overwritten in a circle. As soon as the current segment is filled, the next segment is assigned to the current one, and the filled segment is copied to the archive in a separate stream. WAL compaction already works with archive segments. Also, in a separate stream, it monitors the execution of the checkpoint and starts compression by archive segments, for which physical records are no longer needed.

Performance impact

Since WAL compaction operates as a separate thread, there should not be a direct influence on the operations performed. But it still gives an additional background load on the CPU (compression) and disk (reading each WAL segment from the archive and writing compressed segments), so if the system runs to the limit, it will also lead to degradation of performance.

How to enable and configure

You can enable WAL compaction using property WalCompactionEnabledc DataStorageConfiguration (DataStorageConfiguration.setWalCompactionEnabled(true)). Also, using the DataStorageConfiguration.setWalCompactionLevel () method, you can set the compression ratio if you are not satisfied with the default value (BEST_SPEED).

WAL page snapshot compression

How it works

We have already found out that in WAL, entries are divided into logical and physical. For each change of each page in the page memory, a physical WAL record is generated. Physical records, in turn, are also divided into 2 subspecies: page snapshot record and delta record. Every time we change something on a page and transfer it from a clean state to a dirty one, a full copy of this page is saved in the WAL (page snapshot record). Even if we changed only one byte in the WAL, a record with a size slightly larger than the page size will be saved. If we change something on an already dirty page, then a delta record is formed in the WAL, which reflects only the changes compared to the previous state of the page, but not the entire page. Since resetting the status of pages from dirty to clean is performed during the checkpoint process,immediately after the start of the checkpoint, almost all physical records will consist only of snapshots of pages (since all pages immediately after the start of the checkpoint are blank), then as you approach the next checkpoint, the proportion of delta record starts to grow and is reset again at the beginning of the next checkpoint. Measurements on some synthetic tests showed that the share of page snapshots in the total volume of physical records reaches 90%.

The idea behind WAL page snapshot compression is to compress page snapshots using an off-the-shelf page compression tool (see disk page compression). At the same time, in WAL, records are saved sequentially in append-only mode and there is no need to bind records to the boundaries of file system blocks, therefore, here, unlike the disk page compression mechanism, we absolutely do not need sparse files, so this mechanism will work not only on the OS Linux In addition, we no longer care how much we were able to compress the page. Even if we freed 1 byte, this is already a positive result and we can save compressed data in WAL, unlike disk page compression, where we save a compressed page only if more than 1 file system block is freed.

Pages are well compressible data, their share in the total WAL volume is very high, so without changing the format of the WAL file, we can get a significant reduction in its size. Compression of logical records, among other things, would require a change in format and loss of compatibility, for example, for external consumers who might be interested in logical records, without significantly reducing the file size.

As for disk page compression for WAL page snapshot compression, the compression algorithms ZSTD, LZ4, Snappy, as well as the SKIP_GARBAGE mode can be used.

Performance impact

It is not difficult to notice that the direct inclusion of WAL page snapshot compression affects only the streams that write data to the page memory, that is, those streams that change the data in the caches. Reading from WAL physical records occurs only once, at the moment of raising the node after the fall (and only in the case of a fall during the checkpoint).

This affects data flows as follows: we get a negative effect (CPU) due to the need to compress the page each time before writing to disk and a positive effect (disk IO) by reducing the amount of data being written. Accordingly, everything is simple here, if the system performance rests on the CPU, we get a little degradation, if in disk I / O, we get an increase.

Indirectly, reducing the size of WALs also affects (positively) streams that drop WAL segments into the archive and WAL compaction streams.

Real performance tests in our environment on synthetic data showed a small increase (throughput increased by 10% -15%, latency decreased by 10% -15%).

How to enable and configure

The minimum version of Apache Ignite is 2.8. Switching on and setting is performed as follows:

The class-path must have an ignite-compression module. By default, it is located in the Apache Ignite distribution in the libs / optional directory and is not included in the class-path. You can simply move the directory one level up to libs and then when launched through ignite.sh it will automatically be turned on.
Persistence must be enabled (Enabled through DataRegionConfiguration.setPersistenceEnabled(true)).
DataStorageConfiguration.setWalPageCompression(), ( DISABLED).
DataStorageConfiguration.setWalPageCompression(), javadoc .

The data compression mechanisms discussed in Apache Ignite can be used independently of each other, but any combination of them is also valid. Understanding the principles of their work will determine how they fit your tasks in your environment and what you will have to sacrifice when using them. Disk page compression is designed to compress main storage and can provide medium compression. WAL page snapshot compression will give an average degree of compression of already WAL files, while it is likely to even improve performance. WAL compaction will not positively affect performance, but will minimize the size of WAL files by deleting physical records.

Data compression in Apache Ignite. Sberbank Experience

Disk page compression

How it works

Performance impact

How to enable and configure

WAL compaction

Performance impact

How to enable and configure

WAL page snapshot compression

How it works

Performance impact

How to enable and configure

More articles: