🧀 🛌🏼 👨‍🏫 PostgreSQL Antipatterns: calculating conditions in SQL 👩‍🎨 🥘 🤦🏿

SQL is not C ++, nor JavaScript. Therefore, the calculation of logical expressions is different, and this is not the same thing:

WHERE fncondX() AND fncondY()

= fncondX() && fncondY()

In the process of optimizing the query execution plan, PostgreSQL can arbitrarily “rearrange” the equivalent conditions , not calculate any of them for individual records, assign them to the condition of the applicable index ... In short, it’s easiest to assume that you cannot control in which order Peer conditions will be calculated (and whether at all) .

Therefore, if you still want to control the priority, you need to structurally make these conditions unequal using conditional expressions and operators .

Data and working with them is the basis of our VLSI complex , so it is very important for us that operations on them are performed not only correctly, but also efficiently. Let's look at specific examples where expression computation errors can be made, and where their efficiency is worth improving.

# 0: RTFM

Starting example from the documentation :

When the calculation order is important, it can be fixed using the construction CASE. For example, such a way to avoid dividing by zero in a sentence is WHEREunreliable:
SELECT ... WHERE x > 0 AND y/x > 1.5;
Safe option:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
The construction used in this way CASEprotects the expression from optimization, therefore it should only be used if necessary.

# 1: condition in trigger

BEGIN
  IF cond(NEW.fld) AND EXISTS(SELECT ...) THEN
    ...
  END IF;
  RETURN NEW;
END;

It seems that everything looks good, but ... No one promises that the enclosed SELECTwill not be executed if the first condition is false. Correct using nestedIF :

BEGIN
  IF cond(NEW.fld) THEN
    IF EXISTS(SELECT ...) THEN
      ...
    END IF;
  END IF;
  RETURN NEW;
END;

Now let's look carefully - the whole body of the trigger function turned out to be “wrapped” in IF. And this means that nothing prevents us from removing this condition from the procedure using the WHEN-condition :

BEGIN
  IF EXISTS(SELECT ...) THEN
    ...
  END IF;
  RETURN NEW;
END;
...
CREATE TRIGGER ...
  WHEN cond(NEW.fld);

This approach allows guaranteed saving of server resources under false conditions.

# 2: OR / AND chain

SELECT ... WHERE EXISTS(... A) OR EXISTS(... B)

Otherwise, you can get that both EXISTSwill be "true", but both will be fulfilled .

But if we know for sure that one of them is “true” much more often (or “false” for the ANDchain), is it possible to somehow “increase its priority” so that the second is not performed once again?

It turns out that it is possible - the algorithm approach is close to the topic of the PostgreSQL Antipatterns article : a rare record will reach the middle of the JOIN .

Let's just “pop under CASE” both of these conditions:

SELECT ...
WHERE
  CASE
    WHEN EXISTS(... A) THEN TRUE
    WHEN EXISTS(... B) THEN TRUE
  END

In this case, we did not determine the ELSE-value, that is, if both conditions are false , it CASEwill return NULL, which is interpreted as FALSEin the WHERE-condition.

This example can be combined in another way - to taste and color:

SELECT ...
WHERE
  CASE
    WHEN NOT EXISTS(... A) THEN EXISTS(... B)
    ELSE TRUE
  END

# 3: how to [not] write conditions

We spent two days analyzing the reasons for the “strange” trigger of this trigger — let's see why.

Source:

IF( NEW."_" is null or NEW."_" = (select '""'::regclass::oid) or NEW."_" = (select to_regclass('""')::oid)
     AND (   OLD."" <> NEW.""
          OR OLD."" <> NEW.""
          OR OLD."" <> NEW.""
          OR OLD."" <> NEW.""
          OR OLD."" <> NEW."" ) ) THEN ...

Problem # 1: Inequality Does Not Consider NULL

Imagine that all OLD-fields matter NULL. What will happen?

SELECT NULL <> 1 OR NULL <> 2;
-- NULL

And from the point of view of working out the conditions is NULLequivalent FALSE, as mentioned above.

Solution : use the operator IS DISTINCT FROMfrom the ROW-operator, comparing whole records at once:

SELECT (NULL, NULL) IS DISTINCT FROM (1, 2);
-- TRUE

Problem number 2: different implementation of the same functionality

Compare:

NEW."_" = (select '""'::regclass::oid)
NEW."_" = (select to_regclass('""')::oid)

Why are there extra nesting SELECT? What about the function to_regclass? And in different ways, why? ..

Fix:

NEW."_" = '""'::regclass::oid
NEW."_" = '""'::regclass::oid

Problem # 3: bool operations priority

Format the source:

{... IS NULL} OR
{... } OR
{... } AND
( {... } )

Oops ... In fact, it turned out that in the case of the truth of any of the first two conditions, the whole condition turns into TRUE, without taking into account inequalities. And this is not at all what we wanted.

Fix:

(
  {... IS NULL} OR
  {... } OR
  {... }
) AND
( {... } )

Problem 4 (small): complex OR condition for one field

Actually, we had problems in No. 3 precisely because there were three conditions. But instead of them, you can do one, using the mechanism coalesce ... IN:

coalesce(NEW."_"::text, '') IN ('', '""', '""')

So we NULL“catch”, and complex ORwith brackets do not have to fence.

Total

We fix what we got:

IF (
  coalesce(NEW."_"::text, '') IN ('', '""', '""') AND
  (
    OLD.""
  , OLD.""
  , OLD.""
  , OLD.""
  , OLD.""
  ) IS DISTINCT FROM (
    NEW.""
  , NEW.""
  , NEW.""
  , NEW.""
  , NEW.""
  )
) THEN ...

And if we take into account that this trigger function can be used only in the UPDATE-trigger due to the presence of OLD/NEWa top level in the condition, then this condition can be generally put into the WHEN-condition, as was shown in # 1 ...

PostgreSQL Antipatterns: calculating conditions in SQL