Tarantool: Analyst Look

Hello everyone! My name is Andrey Kapustin. I work as a systems analyst at Mail.ru Group. Our products form a single ecosystem for the user, in which data generates many independent infrastructures: taxi and food ordering services, mail services, social networks. Today, the faster and more accurately we can predict the customer's need, the faster and more accurately we can offer him our products.

Many system analysts and engineers are now asking questions:

  1. How to design a trigger platform architecture for real-time marketing?
  2. How to organize a data structure that meets the requirements of a marketing strategy for interacting with customers?
  3. How to ensure stable operation of such a system under very high loads?

These systems are based on high-load processing and analysis of big data. We have gained considerable experience in these areas. And as an example of one real story, I’ll tell you about our approach to analytics and development of solutions in the field of Real-time Marketing using Tarantool.

Once a large telecom operator came to us for help.

The task was this:

We have more than 100 million subscribers. We know a lot about them: current balance, traffic volume, connected services, trips, favorite places. We use the information as we can: we collect data during the day, put huge amounts of information in the repository (DataLake). We start handlers at night, in the morning we create advertising campaigns and send out offers.

And we want to do the same thing in real time!

Why? Because the faster the telecom operator processes the information, the more money they can earn. For example, on impulse purchases: a user walks past a cafe at lunchtime, and then a discount comes to his phone so that he chooses this particular cafe. That is, you need to "just" offer the right product at the right time and help immediately respond to the offer in a convenient way.



What you need to solve a business problem:

  • You can determine the need through the customer profile.
  • Determine the moment - according to human life events.
  • Stimulate feedback - choosing the optimal communication channel.

This is called Real-Time Marketing. With regard to the telecom sector, sending relevant personalized messages to subscribers at the right time with the ability to IMMEDIATELY respond to an offer. Proposals can be formed both for the target group and for a specific user, while the request should be processed real-time in any case.

From a technical point of view, we must solve the following problems:

  • Keeping up-to-date data of more than 100 million subscribers;
  • Real-time event flow processing at a load of 30,000 RPS;
  • Formation and routing of targeted offers to subscribers with the fulfillment of non-functional requirements (response time, availability, etc.);
  • Seamless connection of new sources of heterogeneous data by subscribers.

“Real time” in this case means processing information in 30 seconds. It’s pointless longer, the moment is missed, the client is gone. And the saddest thing is that in such a situation it will not be clear why (?) - did we propose the wrong thing or didn’t manage in time?

Getting the answer to this question is very important for product development:

  1. Marketing promotion of your products: test hypotheses, increase revenue.
  2. We attract potential customers: we invest in advertising, we capture the market.
  3. We connect additional services or services: we expand the product line.

It’s easy to make mistakes at every stage. And the price of error is great. We must beat quickly and accurately! And for this, customer information must be complete and current. In this case, the information is really worth the money!

After all, the more we know about our customers, the more we will earn. This means that adding each new parameter to the client profile increases the accuracy of targeting. But this is an ongoing process because:

  1. The customer base is constantly growing.
  2. The range of services is expanding.

In such conditions, it is very effective to segment the customer base. In this case, it was decided to use the stratification mechanism - multivariate classification of subscribers.

Simply put, we distinguish specific groups of subscribers (strata) by ranges of values ​​of an unlimited number of attributes. In this case, the subscriber should automatically change the stratum immediately upon transition of the attribute value to the corresponding range.

The figure below is an example of a three-dimensional model of stratification from childhood. A ball is a subscriber.



For each client, we can calculate how much they spent on attracting him, how much they earned and how. That is, we know how much information costs , and how much we lose if we do not update it.

They counted and decided - it is necessary to update! And immediately problems arise: there is always something missing. In each project, new requirements come from the customer that contradict TK, architecture, each other and ... common sense. Maintaining data integrity and relevance is becoming more difficult every day. New sources of information appear with new attributes that are unclear where to store and how to process.

It should be borne in mind that the more normalizeddata, the more restrictions, directories, checks in them. Anyone who tried to add a couple of fields to the table “on the go” knows what kind of “painter” this is: it does not fit into the current data model! And how can the customer explain that if you add a new field, you will have to rewrite half of the project code ?! We “collapse” or “discard” the “extra” analytics at the entrance, and as a result we cannot form relevant offers.

Western colleagues call this effect “Shit in - Shit out.”

As a result, data takes up more space and is more difficult to process. With the increase in the amount of information, this becomes critical, because transaction processing speed decreases. And our goal is to process each request for no more than a minute with a load of 30,000 requests per second.

Conclusion: for real-time marketing, normalizationnot suitable for 100+ million subscribers.

We arrived at a solution in the form of a universal customer profile. It lies in the key-value storage, so we can not fix the data structure. Each column is a key and value, which can be anything.

We got a combination of:

  • Static attributes that are rarely updated (name, passport, address). Mandatory block with ID.
  • And a dynamic tail of arbitrary length — often updated data that depends on the source. Several independent blocks for each source.

This approach is called denormalization. How convenient is it?

  1. The “tail” may not be validated.
  2. We save the "raw" data as is without processing.
  3. We save all incoming information, we lose nothing.
  4. ID , .
  5. ( 2-3 ), .
  6. : .


Now you need to select a tool for implementation. This is usually done by the architect according to the requirements that the analyst assembled. It is very important to find out the NFT - the expected amount of data and the level of load. It depends on what methods of data storage and processing we will use.

The heading of this chapter hints that our service will process a lot of data. And a lot - how much? Let's figure it out.

Data can be considered large if the relationship is not visible to them with the naked eye.

We process more than 100 million different customer profiles that contain unstructured information, are often updated and used - this is real big data.

You need to cache current customer profiles. Without storing hot data in RAM, real-time processing cannot be achieved.

High load


Now we will deal with the load intensity, that is, with the number of requests. The term "high load" is used to describe situations when the equipment ceases to withstand the load.

We process different types of events that occur continuously with an intensity of 10 to 30 thousand requests per second. In this case, complex business logic is used, and the reaction speed is critical. Obviously, we are designing a highly loaded service, which should dynamically scale depending on the instant load.

Tarantool as an accelerator


We at Mail.ru Group use Tarantool to solve such problems. On Habré a lot has been said about how it is built “under the hood”, I won’t repeat it, I’ll recall only the main points:

Tarantool is an In-memory DBMS and application server in one bottle.

When working with a large amount of data, it is advisable to use it in two ways:

  1. As a data showcase for caching information in RAM for the sake of speeding up access.
  2. As an application server for processing data according to specified rules.

That is, the business logic is stored next to the data, which is vital for highly loaded services. In our project, we used Tarantool as a “smart” data storefront with built-in business logic, according to which on-the-fly processing of the incoming stream of events and information takes place.

Why Tarantool is effective for RTM:

  1. Hot data caching. The client profile is cached in memory, so it is always up to date.
  2. Complex real-time computing. Personal offers to clients are formed in real time for each event.
  3. Fault tolerant and scalable solution:

There are two obvious risks in our project:

  1. , . — Tarantool c , .
  2. , . , . , . , . , i.e. distribute 100 million records of the client profile table between several shards in order to parallelize query processing and thus reduce the load on the record. The simplest example is to divide the customer profile table by the range of ID values. To solve this problem, Tarantool provides horizontal scaling tools, more about which can be found in, for example, the article “ Tarantool Cartridge: sharding Lua backend in three lines ”.

Conclusion


Tarantool does not replace Oracle or other analytic repositories. At the same time, it is effective for processing a large amount of data in real time. We have successfully solved the customer’s task within the agreed terms and project budget, so I recommend experimenting with this tool when creating highly loaded services.

All Articles