Caching. Part 2: 60 days before release

Hello! I already wrote to you about how to promote initiatives in a corporation. More precisely, how (sometimes) this succeeds, and what difficulties may arise: A rake retrospective. How a self-made solution turned out to be cooler than a paid one and How we chose a caching system. Part 1 .

Today I want to continue and talk about the psychologically most stressful moment in that project, about which the first two articles - when the outcome of the project was determined not so much by the technical skills of the team as confidence in their calculations and willingness to go to the end.

I have to say - I think that to bring the project to such an intense moment - it's a mistake far used about lshaya than any heroism by stretching the project out of this problem ....
But, I do not hide this experience and willingly share it - because I consider:

  • precisely problem areas are growth points
  • the biggest problems "arrive" precisely from where you do not expect

The combination of these points - just obliges you to share the wonderful experience of "how to earn a gutter out of the blue." But, it should be noted, a similar situation is exceptional in the Sportmaster company. That is, it is possible that this situation will happen again - planning and definition of responsibility now - on a completely different level.

So, it seems that introduction is enough, if you are ready - welcome to cat.



June 2017 We are modifying the admin panel. The admin panel is not only a set of forms and tables in the web-interface - the entered values ​​need to be glued with dozens of other data that we get from third-party systems. Plus, somehow transform and, ultimately, send it to consumers (the main of which is Sportmaster’s ElasticSearch site).

The main difficulty is just to convert and send. Namely:

  1. you need to supply data in the form of json, which weighs 100Kb each, and some pop up for 10MB (scan for the availability and criteria of delivery of goods to stores)
  2. there are json with a structure that has recursive attachments of any level of nesting (for example, a menu inside a menu item, in which there are menu items again, etc.)
  3. the final statement is not approved and is constantly changing (for example, work with goods by Models is replaced by an approach when we work by Color Models). Constantly - this is several times a week, with a peak rate of 2 times a day for a week.

If the first 2 points are purely technical, and are dictated by the task itself, then with the 3rd point, of course, you need to deal with it organizationally. But, the real world is far from ideal, so we work with what we have.

Namely, they figured out how to quickly rivet web forms and their objects on the server side.

One person from the team was appointed to the role of a professional “form slap” and, using prepared web components, rolled out a demo for ui faster than analysts corrected the drawings of this ui.

But in order to change the scheme of transformations, complexity arose here.

First, we went the usual way - to carry out the transformation in the sql-query to Oracle. There was a DB specialist on the team. It lasted until the moment when the request was 2 pages of continuous sql-text. I could go on and on, but when the changes came from the analysts - objectively, the most difficult thing was to find the place where to make the changes.

Analysts expressed rule in the schemes, which, although they were painted in something detached from the code (something of a visio / draw.io / gliffy), but there were sosimilar to squares and arrows in ETL systems (for example, Pentaho Kettle, which at that time was used to supply data to the Sportmaster website). Now, if we had not an SQL query, but an ETL schema! Then the statement and the solution would be topologically identically expressed, which means that editing the code could take as much time as editing the statement!

But with ETL systems there is another difficulty. The same Pentaho Kettle - is great when you need to create a new index in ElasticSearch, in which to write all the data glued from several sources (remark: in fact, it’s Pentaho Kettle that doesn’t work very well, because it doesn’t use javascript in transformations related to java classes through which the consumer accesses data - because of this, you can write down something that can’t be turned into the necessary pojo objects, but this is a separate topic, away from the main course of the article).

But what to do when in the admin panel the user corrected one field in one document? To deliver this change to Sportmaster’s ElasticSearch website, do not create a new index into which to fill in all documents of this type, including an updated one!

I wanted that when one object in the input data changed, then send an update to the site’s ElasticSearch only for the corresponding output document.

Okay, the input document itself, but after all, according to the transformation scheme, it could be attached to documents of a different type through join! So, you need to analyze the transformation scheme and calculate which output documents will be affected by the change in the data in the sources.

The search for boxed products to solve this problem did not lead to anything. Not found.
And when they despaired of finding, they figured it out, but how should it work inside, and how can this be done?

The idea arose right away.

If the final ETL can be broken down into its constituent parts, each of which has a certain type from a finite set (for example, filter, join, etc.), then, perhaps, it will be enough to create the same final set of special nodes that correspond to the original ones, but with the difference that they work not with the data itself, but with their change?

In great detail, with examples and key points in the implementation, our solution - I want to cover in a separate article. To deal with supporting positions - this will require serious immersion, the ability to think abstractly and rely on what has not yet been manifested. Indeed, it will be interesting precisely from a mathematical point of view and is interesting only to those Habrovites who are interested in technical details.
Here I will only say that we created a mathematical model in which we described 7 types of nodes and showed that this system is complete - that is, using these 7 types of nodes and the connections between them - any data transformation scheme can be expressed. The implementation is based on the active use of obtaining and recording data by key (namely by key, without additional conditions).

Thus, our solution had a strong point regarding all introductory difficulties:

  1. the data must be supplied in the form of json -> we work with pojo objects (plain old java object, if someone did not find the times when such a designation was in use), which are easy to overtake in json
  2. there are json with a structure that has recursive embeddings of any level of nesting -> again, pojo (the main thing is that there are no loops, but how many levels of nesting is not important, it’s easy to process in java through recursion)
  3. the final statement is constantly changing -> excellent, since we are changing the transformation scheme faster than analysts draw up (in the diagrams) wishes for experiments

Of the risky moments, only one - we write the solution from scratch, on our own.

Actually, the traps were not long in coming.

Special moment N1. Trap. “Well extrapolated”


Another surprise of an organizational nature was that at the same time as our development, the main master repository was moving to a new version, and the format in which this repository provides data changed. And it would be nice if our system worked immediately with the new storage, and not with the old one. But the new storage is not ready yet. But then, the data structures are known and they can give us a demo stand on which a small amount of related data will be poured. Is going?

Here in the product approach, when working with the value supply stream, a warning is unequivocally hammered into all optimists: there is a blocker -> the task doesn’t work, period.

But then, such a dependence did not even arouse suspicion. Indeed, we were euphoric from the success with the prototype Delta processor - a system for processing data on deltas (implementation of a mathematical model when changes in the output data are calculated using the transformation scheme as a response to a change in the input data).

Among all the transformation schemes, one was the most important. In addition to the fact that the circuit itself was the largest and most complex, there was also a strict requirement for the transformation to be performed according to this circuit - the time limit for execution on the full amount of data.

So, the transformation should be carried out 15 minutes and not a second longer. The main input is a table with 5.5 million records. At the development stage, the table is not yet populated. More precisely, it is filled with a small, test data set in the amount of 10 thousand rows.

Well, let's get started. In the first implementation, the Delta processor worked on the HashMap as the Key-Value storage (let me remind you, we need to read and write objects a lot by key). Of course, that on production volumes, all intermediate objects will not fit in memory - therefore, instead of HashMap, we switch to Hazelcast.

Why exactly Hazelcast - so because this product was familiar, was used in the backend to the site of the Sportmaster. Plus, this is a distributed system and, as it seemed to us - if a friend does something wrong with the performance - we add more instances to a couple of machines and the issue is resolved. In extreme cases - a dozen cars. Horizontal scaling and all things.

And so, we are launching our Delta processor for a targeted transformation. It works almost instantly. This is understandable - the data is only 10 thousand instead of 5.5 million. Therefore, we multiply the measured time by 550, and we get the result: something about 2 minutes. Fine! In fact - a victory!

This was at the very beginning of the project work - just when you need to decide on the architecture, confirm the hypotheses (conduct tests that confirm them), integrate the pilot solution vertically.

Since the tests showed an excellent result - that is, we confirmed all the hypotheses, we quickly turned the pilot around - collected a vertically integrated “skeleton” for a small piece of functionality. And they started the main coding - filling the "skeleton with meat."

What successfully and vigorously engaged. Until that beautiful day, when a complete set of data was uploaded to the master store .

Run the test on this set.

After 2 minutes did not work. I didn’t work after 5, 10, 15 minutes either. That is, they did not fit into the necessary framework. But, with whom it does not happen, it will be necessary to tweak something in detail and fit.

But the test did not work an hour later. And even after 2 hours there was hope that he would work, and we will look for what to tighten up. Remains of hope were even after 5 hours. But, after 10 hours, when they went home, but the test still did not work out - there was no hope anymore.

The trouble was that the next day, when they came to the office, the test still diligently continued to work. As a result, it scrolled for 30 hours, did not wait, turned off.
Catastrophe!

The problem was localized quickly enough.

Hazelcast - when working on a small amount of data - actually scrolled everything in memory. But when it was required to dump data on a disk - performance dipped thousands of times.

Programming would be boring and tasteless occupation, if not for the authorities and the obligation to deliver the finished product. So we, literally a day later, after we received a complete set of data - we need to go to the authorities with a report on how the test on production volumes passed.

This is a very serious and difficult choice:

  1. say “as is” = abandon the project
  2. say "as I would like" = to risk, maybe, it is not known whether we can fix the problem

To understand what feelings arise in this case, it is only possible to fully invest in the idea, to realize the plan for half a year, to create a product that will help colleagues solve a huge layer of problems.

And so, giving up your beloved creation is very difficult.
This is characteristic of all people - we love that which we have put a lot of effort into. Therefore, it is hard to hear criticism - you must consciously make efforts to adequately perceive feedback.

In general, we decided that there are still very, very many different systems that can be used as Key-Value storage, and if Hazelcast does not fit, then something will definitely work. That is, they decided to take a chance. To our justification, we can say that it was not a “bloody deadline” yet - in general, there was still a margin of time in order to “move” to a backup solution.

At that meeting with the bosses, our manager indicated that “the test showed that the system works stably at production volumes, it does not crash”. Indeed, the system worked stably. 60 days

to release .

Special moment N2. Not a trap, but not a discovery. “Less is more”


To find a replacement for Hazelcast with the Key-Value data warehouse role, we compiled a list of all candidates - we got a list of 31 products. This is all that I managed to google and find out from my friends. Further, Google gave out some absolutely obscene options, such as a student's term paper.

To test candidates faster, we prepared a small test that, in a few minutes of launch, showed performance on the right volumes. And they parallelized the work - everyone took the next system from the list, configured, ran the test, took the next.
They worked quickly, snapped several systems a day.

On the 18th system, it became clear that this was pointless. Under our load profile - none of these systems is sharpened. They have a lot of ruffles and curtsies to make it convenient to use, many beautiful approaches to horizontal scaling - but this does not give us any profit.

We need a system that _fast_ saves the key to an object on disk and quickly reads the key.

If so, we outline the algorithm of how this can be implemented. In general, it seems quite feasible - if at the same time: a) sacrifice the amount of data that will occupy the disk, b) have approximately estimates of the volume and characteristic data sizes in each table.
Something in style, allocate memory (on disk) for objects with a margin, pieces of a fixed maximum volume. Then using the index tables ... and so on ...
It was lucky that it did not come to this.

Salvation came in the form of RocksDB.
This is a product from Facebook that is designed for quick reading and saving an array of bytes to disk. At the same time, access to files is provided through an interface that is similar to Key-Value storage. In fact, the key is an array of bytes, the value is an array of bytes. Optimized to do this job quickly and reliably. All. If you need something more beautiful and high-level - screw on top yourself.
Exactly what we need!

RocksDB, bolted in the role of Key-Value storage - brought the target test indicator to the level of 5 hours. It was far from 15 minutes, but the main thing was done. The main thing was to understand what was happening, to understand that writing to disk was as fast as possible, faster than impossible. On SSD, in refined tests, RocksDB squeezed 400Mb / s, and that was enough for our task. Delays - somewhere in ours, in a binding code.

In our code, which means we can handle it. Let's take it apart, but we can handle it.

Special moment N3. Support. "Theoretical calculation"


We have an algorithm and input. We take the range of input data, calculate how many actions the system should perform, how these actions are expressed in the JVM run-time costs (assign a value to a variable, enter a method, create an object, copy an array of bytes, etc.), plus how many calls to RocksDB should be held.

According to calculations, it turns out that they should meet 2 minutes (approximately, as the test for HashMap showed at the very beginning, but this is just a coincidence - the algorithm has changed since then).

And yet, the test runs for 5 hours.

And now, before the release of 30 days.

This is a special date - now it will be impossible to collapse - we will not have time to switch to the backup option.
Of course, on this day the project manager is summoned to the authorities. The question is the same - have time, is everything all right?



Here is the best way to describe this situation - an extended cover picture for this article. That is, the bosses are shown that part of the picture that is rendered in the title. But in reality - like that.

Although, in reality, of course - we were not at all funny. And say that "Everything is cool!" - this is possible only for a person with a very strong skill in self-mastery.
Great, huge respect for the manager, for believing, trusting the developers.

Really, really available code - shows 5 hours. A theoretical calculation - shows 2 minutes. How can this be believed?

But it’s possible if: the model is formulated clearly, how to count is understandable, and what values ​​to substitute are also understandable. That is, the fact that in reality execution takes more time means that in reality it is not exactly the code that we expect to execute there that is being executed.

The central task is to find “ballast” in the code. That is, some actions are performed in addition to the main stream of creating the final data.

Rushed off. Unit tests, functional compositions, fragmentation of functions and localization of places with a disproportionate amount of time spent on execution. A lot of things have been done.
Along the way, we formulated such places where you can seriously tighten up.

For example, serialization. First used the standard java.io. But if we fasten Cryo, then in our case we get a 2.5-fold increase in serialization speed and a 3-fold reduction in the amount of serialized data (which means that IO is 3 times smaller, which just eats up the main resources). But, in more detail, this is a topic for a separate, technical article.

But the key point, or “where the elephant hid” - I will try to describe in one paragraph.

Special point 4. Reception for finding a solution. “Problem = Solution”


When we do get / set by key - in the calculations it went as 1 operation, affects IO in the volume equal to key + object-value (in serialized form, of course).
But what if the object itself on which we call get / set is a Map, which we also get by get / set from the disk. How much will the IO be done in this case?

In our calculations, this feature was not taken into account. That is, it was considered as 1 IO for key + object-value. But in fact?

For example, in the Key-Value storage, by key-1 there is an obj-1 object with type Map, in which a certain obj-2 object must be stored under the key-2 key. Here we thought that the operation would require an IO for key-2 + obj-2. But in reality, you need to consider obj-1, manipulate it and send it to IO: key-1 + obj-1. And if it is a Map in which there are 1000 objects, then the IO consumption will be about 1000 times more. And if 10,000 objects, then ... That's how they got the "ballast".

When a problem is identified, the solution is usually obvious.

In our case, this has become a special structure for manipulations inside nested Map. That is, such a Key-Value, which for get / set takes two keys at once, which should be applied sequentially: key-1, key-2 - that is, for the first level and for the nested one. How to implement such a structure - I will tell you in detail with pleasure, but again, in a separate, technical article.
Here, from this episode, I emphasize and promote such a feature: an extremely detailed problem is a good solution.

Completion


In this article, I tried to show the organizational points and traps that may arise. Such traps are very clearly visible “from the side” or over time, but it is very easy to get into them when you first find yourself next to them. I hope someone will remember such a description, and at the right moment the reminder will work “I’ve heard something like that somewhere before.”

And, most importantly - now that everything is told about the process, about psychological moments, about organizational ones. Now that we have an idea of ​​what tasks and under what conditions the system was created. Now - you can and should tell about the system from the technical side - what kind of mathematical model this is, and what tricks in the code we went to, and what innovative solutions we thought of.

About this in the next article.

In the meantime, Happy New Code!

All Articles