PostgreSQL Antipatterns: fighting hordes of the “dead”

The features of the internal PostgreSQL mechanisms allow it to be very fast in some situations and not so fast in others. Today we will dwell on the classic example of a conflict between how the DBMS works and what the developer does with it - UPDATE vs MVCC principles .

Briefly plot from an excellent article :
When a line is modified with the UPDATE command, two operations are actually performed: DELETE and INSERT. In the current version of the line , xmax is set equal to the number of the transaction that performed UPDATE. Then a new version of the same line is created; its xmin value matches the xmax value of the previous version.
Some time after the completion of this transaction, the old or new version, depending on which COMMIT/ROOLBACK, will be recognized as “dead” (dead tuples) when passing VACUUMthrough the table and cleaned.



But this will not happen right away, but problems with the “dead” can be acquired very quickly - with multiple or mass updates of records in a large table, and a little later, faced with a situation that VACUUM will not be able to help .

# 1: I Like To Move It


Suppose your method on business logic works for itself, and suddenly realizes that it would be necessary to update the field X in some record:

UPDATE tbl SET X = <newX> WHERE pk = $1;

Then, as it progresses, it finds out that the Y field should be updated too:

UPDATE tbl SET Y = <newY> WHERE pk = $1;

... and then also Z - why trifle something?

UPDATE tbl SET Z = <newZ> WHERE pk = $1;

How many versions of this record now have in the database? Yeah, 4 pieces! Of these, one is relevant, and 3 will have to pick up [auto] VACUUM for you.

Do not do like this! Use the update of all fields in one request - almost always the logic of the method can be changed like this:

UPDATE tbl SET X = <newX>, Y = <newY>, Z = <newZ> WHERE pk = $1;

# 2: Use IS DISTINCT FROM, Luke!


So, you still wanted to update many, many records in the table (during the use of a script or converter, for example). And something like this flies into the script:

UPDATE tbl SET X = <newX> WHERE pk BETWEEN $1 AND $2;

In approximately this form, a query is encountered quite often and almost always not to fill in an empty new field, but to correct some errors in the data. Moreover, the correctness of already existing data is not taken into account at all - but in vain! That is, the record is being rewritten, even if it was exactly what I wanted - why? Correct:

UPDATE tbl SET X = <newX> WHERE pk BETWEEN $1 AND $2 AND X IS DISTINCT FROM <newX>;

Many people are not aware of the existence of such a wonderful operator, so here is a cheat sheet for IS DISTINCT FROMother logical operators to help:

... and a little about operations on complex ROW()expressions:

# 3: I’ll recognize my dear by ... blocking


Run two identical parallel processes , each of which is aimed at the recording mark, that it is "in operation":

UPDATE tbl SET processing = TRUE WHERE pk = $1;

Even if these processes substantively do things independent of each other, but within the framework of one ID, on this request the second client will “lock up” until the first transaction is completed.

Solution # 1 : the task is reduced to the previous one.

Just add again IS DISTINCT FROM:

UPDATE tbl SET processing = TRUE WHERE pk = $1 AND processing IS DISTINCT FROM TRUE;

In this form, the second request will simply not change anything in the database, there it is already “everything is as it should” - therefore, blocking will not occur. Further, the fact of the “non-existence" of the record is already processed in the applied algorithm.

Decision number 2 : advisory locks

A big topic for a separate article in which you can read about the methods of application and the "rake" of recommendation locks .

Solution # 3 : without [d] smart calls

But exactly, exactly, you should have simultaneous work with the same record ? Or did you still mess up with client-side business logic call algorithms, for example? And if you think about it? ..

All Articles