👋🏿 🐼 📔 PostgreSQL Antipatterns: Registry Navigation 👶🏿 🤤 📠

Today there will be no complicated cases and sophisticated SQL algorithms. Everything will be very simple, at Captain's level. Evidence - we do a review of the event register with sorting by time.

That is, there is a plate in the base events, and its field tsis exactly the same time by which we want to display these records in an orderly manner:

CREATE TABLE events(
  id
    serial
      PRIMARY KEY
, ts
    timestamp
, data
    json
);

CREATE INDEX ON events(ts DESC);

It is clear that we will have not a dozen entries there, so we will need some kind of page navigation .

# 0 “I’m a pogrommist at my mom”

cur.execute("SELECT * FROM events;")
rows = cur.fetchall();
rows.sort(key=lambda row: row.ts, reverse=True);
limit = 26
print(rows[offset:offset+limit]);

It’s almost no joke - rarely, but found in the wild. Sometimes after working with ORM it can be difficult to switch to a “direct” work with SQL.

But let's move on to more common and less obvious problems.

#1. OFFSET

SELECT
  ...
FROM
  events
ORDER BY
  ts DESC
LIMIT 26 OFFSET $1; -- 26 -   , $1 -

26? . , 25 , 1, , - .

, «» , . PostgreSQL , , — .

And while in the application interface the viewing of the registry is implemented as switching between visual “pages”, nobody for a long time notices anything suspicious. Exactly until the moment when, in the struggle for convenience, UI / UX does not decide to remake the interface to “endless scroll” - that is, all registry entries are drawn in a single list that the user can twist up and down.

And now, during the next test, you are caught duplicating entries in the registry. Why, because the table has a normal index (ts)on which your query is based?

Exactly because you did not consider what is tsnot a unique key in this table. Actually, his meanings are not unique, like any “time” in real conditions - that’s why the same record in two neighboring queries easily “jumps” from page to page due to a different final order as part of sorting the same key value.

In fact, the second problem is also hidden here, which is much more difficult to notice - some entries will not be shown at all! After all, the "duplicated" records took someone's place. A detailed explanation with beautiful pictures can be found here .

Expanding the Index

The cunning developer understands that you need to make the index key unique, and the easiest way is to expand it with a deliberately unique field, which PK is perfect for:

CREATE UNIQUE INDEX ON events(ts DESC, id DESC);

And the request mutates:

SELECT
  ...
ORDER BY
  ts DESC, id DESC
LIMIT 26 OFFSET $1;

# 2 Transition to "cursors"

Some time later, the DBA comes to you and is “happy” that your requests are hellishly loading the server with their horse-drawn OFFSETs , and in general, it is time to switch to navigation from the last value shown . Your request mutates again:

SELECT
  ...
WHERE
  (ts, id) < ($1, $2) --      
ORDER BY
  ts DESC, id DESC
LIMIT 26;

You breathed a sigh of relief before it came ...

# 3 Index Cleaning

Because one day your DBA read an article about finding inefficient indexes and realized that the “last but not the least” timestamp is not good . And he came to you again - now with the thought that this index should nevertheless turn back into (ts DESC).

But what to do with the initial problem of “jumping” records between pages? .. And everything is simple - you need to choose blocks with an unlimited number of records!

In general, who forbids us to read not “exactly 26”, but “not less than 26”? For example, so that in the next block there are records with obviously different valuests - then there will be no problems with “jumping” between the blocks!

Here's how to do it:

SELECT
  ...
WHERE
  ts < $1 AND
  ts >= coalesce((
    SELECT
      ts
    FROM
      events
    WHERE
      ts < $1
    ORDER BY
      ts DESC
    LIMIT 1 OFFSET 25
  ), '-infinity')
ORDER BY
  ts DESC;

What is going on here?

We step down 25 records and get the “boundary” value ts.
If there is already nothing there, then replace the NULL value with -infinity.
Subtract the entire segment of values between the received value tsand the parameter $ 1 passed from the interface (the previous “last” drawn value).
If a block returned with less than 26 entries, it is the last one.

Or the same picture:

Since now our sample does not have any definite “beginning” , nothing prevents us from “reversing” this request in the opposite direction and implementing dynamic loading of data blocks from the “reference point” in both directions - both down and up.

Comment

, , « ». Index Only Scan.
, , ts , . — « 00:00:00.000», . , . , .