Today there will be no complicated cases and sophisticated SQL algorithms. Everything will be very simple, at Captain's level. Evidence - we do a review of the event register with sorting by time.That is, there is a plate in the base events
, and its field ts
is exactly the same time by which we want to display these records in an orderly manner:CREATE TABLE events(
id
serial
PRIMARY KEY
, ts
timestamp
, data
json
);
CREATE INDEX ON events(ts DESC);
It is clear that we will have not a dozen entries there, so we will need some kind of page navigation .# 0 âIâm a pogrommist at my momâ
cur.execute("SELECT * FROM events;")
rows = cur.fetchall();
rows.sort(key=lambda row: row.ts, reverse=True);
limit = 26
print(rows[offset:offset+limit]);
Itâs almost no joke - rarely, but found in the wild. Sometimes after working with ORM it can be difficult to switch to a âdirectâ work with SQL.But let's move on to more common and less obvious problems.#1. OFFSET
SELECT
...
FROM
events
ORDER BY
ts DESC
LIMIT 26 OFFSET $1;
26? . , 25 , 1, , - .
, «» , . PostgreSQL , , â .
And while in the application interface the viewing of the registry is implemented as switching between visual âpagesâ, nobody for a long time notices anything suspicious. Exactly until the moment when, in the struggle for convenience, UI / UX does not decide to remake the interface to âendless scrollâ - that is, all registry entries are drawn in a single list that the user can twist up and down.And now, during the next test, you are caught duplicating entries in the registry. Why, because the table has a normal index (ts)
on which your query is based?Exactly because you did not consider what is ts
not a unique key in this table. Actually, his meanings are not unique, like any âtimeâ in real conditions - thatâs why the same record in two neighboring queries easily âjumpsâ from page to page due to a different final order as part of sorting the same key value.In fact, the second problem is also hidden here, which is much more difficult to notice - some entries will not be shown at all! After all, the "duplicated" records took someone's place. A detailed explanation with beautiful pictures can be found here .Expanding the Index
The cunning developer understands that you need to make the index key unique, and the easiest way is to expand it with a deliberately unique field, which PK is perfect for:CREATE UNIQUE INDEX ON events(ts DESC, id DESC);
And the request mutates:SELECT
...
ORDER BY
ts DESC, id DESC
LIMIT 26 OFFSET $1;
# 2 Transition to "cursors"
Some time later, the DBA comes to you and is âhappyâ that your requests are hellishly loading the server with their horse-drawn OFFSETs , and in general, it is time to switch to navigation from the last value shown . Your request mutates again:SELECT
...
WHERE
(ts, id) < ($1, $2)
ORDER BY
ts DESC, id DESC
LIMIT 26;
You breathed a sigh of relief before it came ...# 3 Index Cleaning
Because one day your DBA read an article about finding inefficient indexes and realized that the âlast but not the leastâ timestamp is not good . And he came to you again - now with the thought that this index should nevertheless turn back into (ts DESC)
.But what to do with the initial problem of âjumpingâ records between pages? .. And everything is simple - you need to choose blocks with an unlimited number of records!In general, who forbids us to read not âexactly 26â, but ânot less than 26â? For example, so that in the next block there are records with obviously different valuests
- then there will be no problems with âjumpingâ between the blocks!Here's how to do it:SELECT
...
WHERE
ts < $1 AND
ts >= coalesce((
SELECT
ts
FROM
events
WHERE
ts < $1
ORDER BY
ts DESC
LIMIT 1 OFFSET 25
), '-infinity')
ORDER BY
ts DESC;
What is going on here?- We step down 25 records and get the âboundaryâ value
ts
. - If there is already nothing there, then replace the NULL value with
-infinity
. - Subtract the entire segment of values ââbetween the received value
ts
and the parameter $ 1 passed from the interface (the previous âlastâ drawn value). - If a block returned with less than 26 entries, it is the last one.
Or the same picture:Since now our sample does not have any definite âbeginningâ , nothing prevents us from âreversingâ this request in the opposite direction and implementing dynamic loading of data blocks from the âreference pointâ in both directions - both down and up.Comment
- , , « ». Index Only Scan.
- , ,
ts
, . â « 00:00:00.000», . , . , .