Redis Best Practices, Part 2

The second part of the Redis Best Practices translation cycle from Redis Labs, and it discusses interaction patterns and data storage patterns.

The first part is here .

Interaction patterns


Redis can function not only as a traditional DBMS, but also its structures and commands can be used to exchange messages between microservices or processes. The widespread use of Redis clients, the speed and efficiency of the server and protocol, as well as the built-in classic structures allow you to create your own workflows and event mechanisms. In this chapter, we will cover the following topics:

  • queue of events;
  • blocking with Redlock;
  • Pub / Sub;
  • distributed events.

Event queue


Lists in Redis are ordered line lists, very similar to linked lists that you may be familiar with. Adding a value to a list (push) and deleting a value from a list (pop) are very lightweight operations. As you can imagine, this is a very good structure for managing a queue: add elements to the beginning and read them from the end (FIFO). Redis also provides additional features that make this pattern more efficient, reliable, and easy to use.

Lists have a subset of commands that let you execute "blocking" behavior. The term “blocking” refers to a connection with only one client. In fact, these commands do not allow the client to do anything until a value appears in the list or until the timeout expires. This eliminates the need to poll Redis, waiting for the result. Since the client cannot do anything while it expects a value, we will need two open clients to illustrate this:
#Customer 1Client 2
1
> BRPOP my-q 0
[expect value]
2
> LPUSH my-q hello
(integer) 1
1) "my-q"
2) "hello"
[client unlocked, ready to accept commands]
3
> BRPOP my-q 0
[expect value]

In this example, in step 1, we see that the blocked client does not immediately return anything, since it does not contain anything. The final argument is the wait time. Here 0 means eternal expectation. On the second line , a value is entered in my-q , and the first client immediately exits the blocking state. On the third line, BRPOP is called again (you can do this in a loop in the application), and the client also waits for the next value. By pressing “Ctrl + C” you can break the lock and exit the client.

Let's reverse the example and see how BRPOP works with a non-empty list:
#Customer 1Client 2
1
> LPUSH my-q hello
(integer) 1
2
> LPUSH my-q hej
(integer) 2
3
> LPUSH my-q bonjour
(integer) 3
4
> BRPOP my-q 0
1) "my-q"
2) "hello"
5
> BRPOP my-q 0
1) "my-q"
2) "hej"
6
> BRPOP my-q 0
1) "my-q"
2) "bonjour"
7
> BRPOP my-q 0
[expect value]

In steps 1-3, we add 3 values ​​to the list and see that the answer grows, indicating the number of elements in the list. Step 4, despite calling BRPOP, returns the value immediately. This is because blocking behavior occurs only when there are no values ​​in the queue. We can see the same instant response in steps 5-6 because this is done for each item in the queue. In step 7, BRPOP does not find anything in the queue and blocks the client until something is added.

Often queues represent some work that needs to be done in another process (worker). In this type of workload, it is important that the work does not disappear if the worker falls for some reason during execution. Redis supports this type of queue. To do this, use the BRPOPLPUSH command instead of BRPOP. She expects a value in one list, and as soon as it appears there, puts it in another list. This is done atomically, so it is impossible for two workers to change the same value. Let's see how it works:
#Customer 1Client 2
1
> LINDEX worker-q 0
(nil)
2[If the result is not nil, somehow process it and go to step 4]
3
> LREM worker-q -1 [   1]
(integer) 1
[return to step 1]
4
> BRPOPLPUSH my-q worker-q 0
[expect value]
5
> LPUSH my-q hello
"hello"
[client unlocked, ready to accept commands]
6[handle hello]
7
> LREM worker-q -1 hello
(integer) 1
8[return to step 1]

In steps 1-2, we do nothing, because worker-q is empty. If something has returned, then we process it and delete it, and again return to step 1 to check if something has got into the queue. Thus, we first clear the worker's queue and perform the existing work. In step 4, we wait until the value appears in my-q , and when it does, it is atomically transferred to worker-q . Then we somehow process “hello” , after that we delete it from worker-q and return to step 1. If the process dies in step 6, the value still remains in worker-q . After restarting the process, we will immediately delete everything that was not deleted in step 7.

This pattern greatly reduces the likelihood of job loss, but only if the worker dies between steps 2 and 3 or 5 and 6, which is unlikely, but best practice will take this into account in the worker's logic.

Lock with redlock


Sometimes in the system it is necessary to block some resource. This may be necessary in order to apply important changes that cannot be resolved in a competitive environment. Blocking Objectives:

  • allow one and only one worker to capture the resource;
  • be able to reliably release the lock object;
  • Do not lock the resource tightly (must be unlocked after a certain period of time).

Redis is a good option for implementing blocking, as it has a simple key-based data model, and each shard is single-threaded and fairly fast. There is an excellent lock implementation using Redis called Redlock.
Redlock clients are available for almost every language, however, it is important to know how Redlock works in order to use it safely and effectively.

First, you need to understand that Redlock is designed to run on at least 3 machines with independent Redis instances. This eliminates the single point of failure in your locking mechanism, which can lead to a deadlock of all resources. Another point to understand is that although the clocks on the machines should not be 100% synchronized, they should function the same way - time moves at the same speed: one second on the machine And the same as one second on machine B.

Setting a lock object with Redlock begins by obtaining a timestamp with millisecond precision. You must also indicate in advance the blocking time. Then the blocking object is set by setting (SET) the key with a random value (only if this key does not exist yet) and setting the timeout for the key. This is repeated for each independent instance. If the instance falls, then it is immediately skipped. If the lock object was successfully installed on most instances before the timeout expires, then it is considered captured. The time to install or update the lock object is the amount of time it takes to reach the lock state, minus the predefined lock time. In the event of an error or timeout, unlock all instances and try again.

To free a lock object, it is better to use a Lua script that will check if the expected random value is in the set of keys. If there is, then you can delete it, otherwise it is better to leave the keys, as these may be newer lock objects.

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

The Redlock process provides good guarantees and the absence of a single point of failure, so you can be completely sure that single lock objects will be distributed and that no mutual locks will occur.

Pub / Sub


In addition to data storage, Redis can also be used as the Pub / Sub platform (publisher / subscriber). In this pattern, a publisher can issue messages to any number of channel subscribers. These are messages based on the “shot and forget” principle, that is, if the message is released and the subscriber does not exist, then the message disappears without a possibility of recovery.
By subscribing to the channel, the client enters subscriber mode and can no longer call commands - it becomes readonly. The publisher has no such restrictions.

You can subscribe to more than one channel. We start by subscribing to the two weather and sports channels using the SUBSCRIBE command:

> SUBSCRIBE weather sports
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "weather"
3) (integer) 1
1) "subscribe"
2) "sports"
3) (integer) 2

In a separate client (another terminal window, for example) we can publish messages in any of these channels using the PUBLISH command:

> PUBLISH sports oilers/7:leafs/1
(integer) 1

The first argument is the name of the channel, the second is the message. The message can be any, in this case it is a coded account in the game. The command returns the number of clients to whom the message will be delivered. In the subscriber client, we immediately see the message:

1) "message"
2) "sports"
3) "oilers/7:leafs/1"

The response contains three elements: an indication that this is a message, a subscription channel, and, in fact, a message. The client immediately after receiving returns to listening to the channel.

Returning to the publisher, we may post another message:

> PUBLISH weather snow/-4c
(integer) 1

In the subscriber we will see the same format, but with a different channel with the message:

1) "message"
2) "weather"
3) "snow/-4c"

Let's post a message to a channel where there are no subscribers:

> PUBLISH currency CADUSD/0.787
(integer) 0

Since no one is listening to the currency channel , the answer will be 0. This message has been sent, and clients who later subscribe to this channel will not receive a notification about this message - it has been sent and forgotten.

In addition to subscribing to a single channel, Redis allows subscribing to channels by mask. The glob-style mask is passed to the PSUBSCRIBE command:

> PSUBSCRIBE sports:*

The client will receive messages from all channels, starting with sports: . In another client, call the following commands:

> PUBLISH sports:hockey oilers/7:leafs/1
(integer) 1
> PUBLISH sports:basketball raptors/33:pacers/7
(integer) 1
> PUBLISH weather:edmonton snow/-4c
(integer) 0

Please note that the first two teams return 1, while the last returns 0. And although we are not directly subscribed to sports: hockey or sports: basketball , the client receives messages through a subscription by mask. In the client-subscriber window, we can see that there are results only for channels matching the mask.

1) "pmessage"
2) "sports:*"
3) "sports:hockey"
4) "oilers/7:leafs/1"
1) "pmessage"
2) "sports:*"
3) "sports:basketball"
4) "raptors/33:pacers/7"

This output is slightly different from the output of the SUBSCRIBE command because it contains the mask itself, as well as the real name of the channel.

Distributed Events


Redis' Pub / Sub messaging scheme can be expanded to create interesting distributed events. Let's say we have a structure that is stored in a hash table, but we want to update clients only when a single field exceeds the numerical value set by the subscriber. We will listen to the channels by mask and extract the hash in status . In this example, we are interested in update_status with values ​​5-9.

> PSUBSCRIBE update_status:[5-9]
1) "psubscribe"
2) "update_status:[5-9]"
3) (integer) 1
...

To change the status / error_level value , we need two commands that can be executed sequentially or in the MULTI / EXEC block. The first command sets the level, and the second publishes a notification with the value encoded in the channel itself.

> HSET status error_level 5
(integer) 1
> PUBLISH update_status:5 0
(integer) 1

In the first window, we see that the message has been received, and after that you can switch to another client and call the HGETALL command:

...
1) "pmessage"
2) "update_status:[5-9]"
3) "update_status:5"
4) "0"

> HGETALL status
1) "error_level"
2) "5"

We can also use this method to update the local variable of some lengthy process. This can allow multiple instances of the same process to exchange data in real time.

Why is this pattern better than using Pub / Sub? When the process restarts, it can just get the whole state and start listening. Changes will be synchronized between any number of processes.

Data storage patterns


There are several patterns for storing structured data in Redis. In this chapter we will consider the following:

  • data storage in JSON;
  • storage facilities.

JSON data storage


There are several options for storing JSON data in Redis. The most common form is to serialize the object in advance and save it under a special key:

> SET car "{\"colour\":\"blue\",\"make\":\"saab\",\"model\":93,\"features\":[\"powerlocks\",\"moonroof\"]}"
OK
> GET car
"{\"colour\":\"blue\",\"make\":\"saab\",\"model\":93,\"features\":[\"powerlocks\",\"moonroof\"]}"

It would seem to look simple, but it has some very serious drawbacks:

  • serialization takes client computing resources to read and write;
  • JSON format increases data size;
  • Redis has only an indirect way of handling data in JSON.

The first couple of points may be negligible on small amounts of data, but the costs will increase as the data grows. However, the third point is the most critical.

Prior to Redis 4.0, the only way to work with JSON inside Redis was to use a Lua script in the cjson module. This partially solved the problem, although it still remained a bottleneck and created additional hassle with learning Lua. In addition, many applications simply received the entire JSON string, deserialized it, worked with the data, serialized back, and saved it again. This is an antipattern. There is a big risk of losing data in this way.

#Application Instance # 1Application Instance # 2
1
> GET my-car
2[deserialize, change machine color and serialize again]
> GET my-car
3
> SET my-car

[new value from instance # 1]
[deserialize, change machine model and serialize again]
4
> SET my-car

[new value from instance # 2]
5
> GET my-car

The result on line 5 will show changes only to instance 2, and the color change by instance 1 will be lost.

Redis version 4.0 and higher has the ability to use modules. ReJSON is a module that provides a special data type and commands for direct interaction with it. ReJSON saves the data in binary format, which reduces the size of the stored data, provides faster access to elements without spending time on de / serialization.

To use ReJSON, you need to install it on a Redis server or enable it in Redis Enterprise.

The previous example using ReJSON would look like this:

#Application Instance # 1Application Instance # 2
1
> JSON.SET car2 . '{"colour": "blue",  "make":"saab", "model":93,  "features": ["powerlocks",  "moonroof"]}‘
OK
2
> JSON.SET car2 colour '"red"'
OK
3
> JSON.SET car2 model '95'
OK
> JSON.GET car2 .
"{\"colour\":\"red",\"make\":\"saab\",\"model\":95,\"features\":[\"powerlocks\",\"moonroof\"]}"

ReJSON provides a safer, faster, and more intuitive way to work with JSON data in Redis, especially in cases where atomic changes to nested elements are necessary.

Object Storage


At first glance, the standard Redis “hash table” data type may seem very similar to a JSON object or other type. It is much easier to make fields either a string or a number and prevent nested structures. However, after calculating the “path” to each field, you can “flatten” the object and save it in the Redis hash table.

{
    "colour": "blue",
    "make": "saab",
    "model": {
        "trim": "aero",
        "name": 93
    },
    "features": ["powerlocks", "moonroof"]
}

Using JSONPath (XPath for JSON), we can represent each element at the same level of the hash table:

> HSET car3 colour blue
> HSET car3 make saab
> HSET car3 model.trim aero
> HSET car3 model.name 93
> HSET car3 features[0] powerlocks
> HSET car3 features[1] moonroof

For clarity, the commands are listed separately, but many parameters can be passed to HSET.

Now you can request the entire object or its individual field:

> HGETALL car3
 1) "colour"
 2) "blue"
 3) "make"
 4) "saab"
 5) "model.trim"
 6) "aero"
 7) "model.name"
 8) "93"
 9) "features[0]"
10) "powerlocks"
11) "features[1]"
12) "moonroof"

> HGET car3 model.trim
"aero"

Although this provides a quick and useful way to retrieve a stored object in Redis, it has its drawbacks:

  • in different languages ​​and libraries, the implementation of JSONPath may be different, causing incompatibility. In this case, it is worth serializing and deserializing the data with one tool;
  • array support:
    • sparse arrays can be problematic;
    • it is impossible to perform many operations, such as inserting an element in the middle of an array.

  • Unnecessary resource consumption in JSONPath keys.

This pattern is pretty much the same as ReJSON. If ReJSON is available, then in most cases it is better to use it. However, storing objects in the way above has one advantage over ReJSON: integration with the Redis SORT team. However, this command is computationally complex and is a separate complex topic beyond the scope of this pattern.

The next concluding part will cover time series patterns, speed limit patterns, Bloom filter patterns, counters, and the use of Lua in Redis.

PS I tried to adapt the text of these articles in “barbaric” English as much as possible into Russian, but if you think that somewhere the idea is incomprehensible or incorrect, correct me in the comments.

Source: https://habr.com/ru/post/undefined/


All Articles