How much is new in the Devil's Dozen?

We are talking only about PostgreSQL 13. On April 8, a “freeze” took place - PostgreSQL feature freeze , now only those features that are accepted before this date will be included in this version.

It’s hard to name a revolutionary version of this. There are no cardinal, conceptual changes in it. In addition, such important patches as Table and Functions for the JSON / SQL standard, which I wanted to see in PG12 next to the JSONPath patch, did not have time to enter it; ready-made embedded storage did not appear - only the interface is being finalized. But the list of improvements is still impressive. We have prepared a fairly complete summary of the patches included in the Devil's Dozen.




Changes to SQL Commands


CREATE DATABASE ... LOCALE

Utilitiesinitdb,createdband the teamCREATE COLLATIONhave a settingLOCALEthat allows you to specify values for the rightLC_CTYPEandLC_COLLATE. Now the same opportunity appeared in the teamCREATE DATABASE:

CREATE DATABASE db_koi8r TEMPLATE template0 
    ENCODING 'KOI8R' LOCALE 'ru_RU.KOI8R';

ALTER VIEW ... RENAME COLUMN

The column name in the view can now be changed with the commandALTER VIEW. Previously, this required re-creating the view.

Suppose you forgot to give the column a name:

CREATE VIEW uptime AS SELECT current_timestamp, current_timestamp - pg_postmaster_start_time();
SELECT * FROM uptime;

       current_timestamp       |    ?column?     
-------------------------------+-----------------
 2020-03-23 15:37:00.088824+03 | 04:18:24.897856

It can be fixed:

ALTER VIEW uptime RENAME COLUMN "?column?" TO uptime;
SELECT * FROM uptime;

       current_timestamp       |     uptime      
-------------------------------+-----------------
 2020-03-23 15:37:40.726516+03 | 04:19:05.535548


ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION

The generated column of the table can now be made normal, that is, delete the expression to evaluate it:

CREATE TABLE payments (
    id integer PRIMARY KEY,
    amount numeric(18,2),
    income_tax numeric(18,2) GENERATED ALWAYS AS (amount*0.13) STORED
);
INSERT INTO payments(id, amount) VALUES(1, 42);
\d payments

                                     Table "public.payments"
   Column   |     Type      | Collation | Nullable |                   Default            
        
------------+---------------+-----------+----------+--------------------------------------
 id         | integer       |           | not null |
 amount     | numeric(18,2) |           |          |
 income_tax | numeric(18,2) |           |          | generated always as ((amount * 0.13))
 stored
Indexes:
    "payments_pkey" PRIMARY KEY, btree (id)

Subsequently, they decided that income_tax should be explicitly set. Delete the expression:

ALTER TABLE payments ALTER COLUMN income_tax DROP EXPRESSION;
\d payments

                   Table "public.payments"
   Column   |     Type      | Collation | Nullable | Default
------------+---------------+-----------+----------+---------
 id         | integer       |           | not null |
 amount     | numeric(18,2) |           |          |
 income_tax | numeric(18,2) |           |          |
Indexes:
    "payments_pkey" PRIMARY KEY, btree (id)

Of course, the existing data from the column has not gone away:

SELECT * FROM payments;

 id | amount | income_tax
----+--------+------------
  1 |  42.00 |       5.46


DROP DATABASE ... FORCE
If you want to delete a database without waiting for all users to disconnect, you can use the newFORCEcommandoptionDROP DATABASE.

CREATE DATABASE db;

Connect to the new database:

CREATE EXTENSION IF NOT EXISTS dblink;
SELECT dblink_connect('dbname=db');

 dblink_connect
----------------
 OK

And now we will delete, forcibly interrupting, as well pg_terminate_backend, open connections:

DROP DATABASE db WITH (FORCE);

ALTER TYPE ... SET STORAGE
The commandALTER TYPEallows for basic data types to change various properties, in particular, the storage strategy. Previously, you could only set it in a teamCREATE TYPE.

For demonstration, we will not create a new base type, but use the existing -tsquery. But first, create a separate database and connect to it:

CREATE DATABASE db;
\c db

A storage strategy is used for the tsquery data type plain, so columns of tables of this type get the same strategy:

SELECT typname,typstorage FROM pg_type WHERE typname = 'tsquery';

 typname | typstorage
---------+------------
 tsquery | p

CREATE TABLE queries (query tsquery);
SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'queries'::regclass AND attname = 'query';

 attname | attstorage
---------+------------
 query   | p

If you need to use a different strategy for new tables, you can change the base type:

ALTER TYPE tsquery SET (storage=external);
SELECT typname,typstorage FROM pg_type WHERE typname = 'tsquery';

 typname | typstorage
---------+------------
 tsquery | e

The type of storage in the new tables will also change:

CREATE TABLE new_queries (query tsquery);
SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'new_queries'::regclass AND attname = 'query';

 attname | attstorage
---------+------------
 query   | e

It should be borne in mind that changing a strategy involving the use of TOAST plaincannot be changed back to :

ALTER TYPE tsquery SET (storage=plain);

ERROR:  cannot change type's storage to PLAIN

Therefore, the experiments were carried out in a separate database, which is not a pity to delete.

ALTER STATISTICS ... SET STATISTICS The

command CREATE STATISTICSallows you to collect lists of the most common values ​​for selected combinations of table columns. The number of most common values ​​collected is determined by the parameter default_statistics_target. The value for specific statistics can now be changed with the command:

ALTER STATISTICS  SET STATISTICS _;

FETCH FIRST with WITH TIES option
As you know,SELECTinstead of specifying acommand,LIMITyou can use the syntax defined in the SQL standard:

SELECT *
FROM (VALUES (1,'1.1'), (2,'2.1'), (2,'2.2'), (3,'3.1')) t(a,b)
ORDER BY a
FETCH FIRST 2 ROWS ONLY;

 a |  b  
---+-----
 1 | 1.1
 2 | 2.1
(2 rows)

Now it FETCHsupports the phrase WITH TIES, which adds to the output all “related” lines (lines equal to the ones already selected, if only the sorting condition is taken into account):

SELECT *
FROM (VALUES (1,'1.1'), (2,'2.1'), (2,'2.2'), (3,'3.1')) t(a,b)
ORDER BY a
FETCH FIRST 2 ROWS WITH TIES;

 a |  b  
---+-----
 1 | 1.1
 2 | 2.1
 2 | 2.2
(3 rows)

Built-in Functions and Data Types


get_random_uuid The
new functionget_random_uuidreturns the version 4 UUID value (random value):

SELECT gen_random_uuid();

           gen_random_uuid            
--------------------------------------
 25e02793-80c0-438c-be07-c94b966c43ab

The function is useful for generating unique UUID values ​​in distributed systems.
Previously, you had to use the uuid-ossp or pgcrypto libraries.

min_scale and trim_scale for values ​​of type numeric

The function min_scaledetermines the number of significant digits in the fractional part of the number, and the function trim_scalediscards non- significant zeros:

SELECT n, min_scale(n), trim_scale(n) FROM (VALUES(42.42000)) as t(n);

    n     | min_scale | trim_scale
----------+-----------+------------
 42.42000 |         2 |      42.42

gcd and lcm

Replenishment in the section of mathematical functions. Now you can quickly find the largest common divisor (gcm) and the smallest common multiple (lcm):

SELECT gcd(54,24), lcm(54,24);

 gcd | lcm
-----+-----
   6 | 216

Aggregate functions min and max for type pg_lsn Aggregate functions have been added for the

data typeandthat allows you to perform queries of the form:pg_lsnminmax

SELECT min(restart_lsn) FROM pg_replication_slots;
SELECT min(sent_lsn) FROM pg_stat_replication;

Checking the type modifier of the return value of a function

In previous versions, the type modifier was not checked for the return value of the function.

Suppose there is a type for storing monetary units and a function that returns the amount of income tax:

CREATE TYPE currency AS (
    amount numeric(18,2),
    code   text
);
CREATE FUNCTION income_tax(quantity currency) RETURNS currency
    AS 'SELECT quantity.amount * 0.13, quantity.code' LANGUAGE SQL;

Calling the function, we expect to get two decimal places, however, we get four. Even explicit casting after a function call does not help (third column):

SELECT amount, code, amount::numeric(18,2)
FROM income_tax(ROW(42.42, ''))\gx

-[ RECORD 1 ]--
amount | 5.5146
code   | 
amount | 5.5146

In version 13, the result is correct:

SELECT amount, code, amount::numeric(18,2)
FROM income_tax(ROW(42.42, ''))\gx

-[ RECORD 1 ]
amount | 5.51
code   | 
amount | 5.51

The localized names in to_date () and to_timestamp ()

Functionsto_datealsoto_timestamplearned to understand the localized names of the months and days of the week. Previously, only English names could be used:

SELECT to_date(', 24  2020', 'TMDay, DD TMMonth YYYY');

  to_date   
------------
 2020-03-24

normalize and IS NORMALIZED

To comply with the SQL standard, the normalize () function has been added to normalize a Unicode string, and the IS NORMALIZED predicate to check if a string is normalized.

SELECT str, normalize(str, NFC) AS nfc,
       str IS NFC NORMALIZED AS is_nfc_normalized,
       normalize(str, NFKC) AS nfkc,
       str IS NFKC NORMALIZED AS is_nfkc_normalized
FROM (VALUES(U&'\0032\2075')) AS vals(str)\gx

-[ RECORD 1 ]------+---
str                | 2⁵
nfc                | 2⁵
is_nfc_normalized  | t
nfkc               | 25
is_nfkc_normalized | f

Read more about UNICODE normalization forms.

Xid8 type and xid8_current () function for 64-bit transaction numbers

Added a new xid8 data type for 64-bit transaction number. But no, this does not mean that PostgreSQL switched to 64-bit transactions: everything works exactly as before. But some functions return a new type, for example, is now recommended for use instead of the old functions pg_current_xact_id txid_current, which returned int8, and so on. N.

New polymorphic data types anycompatible family

added types anycompatible, anycompatiblearray, anycompatiblenonarray, anycompatiblerange. Unlike family types anyelement, new types allow you to use not exactly the same, but actually compatible types.

In the following example, the functionmaximumas arguments defined as anycompatibleare passed integerand numeric. The return value is converted to a common value for these two types:

CREATE FUNCTION maximum(a anycompatible, b anycompatible) RETURNS anycompatible
	AS 'SELECT CASE WHEN a > b THEN a ELSE b END' LANGUAGE SQL;

SELECT f, pg_typeof(f) FROM maximum(42, 42.42) f;

   f   | pg_typeof
-------+-----------
 42.42 | numeric

Moreover, the types anycompatible- and any- are two independent sets of types:

CREATE FUNCTION first_elems(a anyarray, b anycompatiblearray)
    RETURNS TABLE(a anyelement, b anycompatible) AS $$
        SELECT ROW(a[array_lower(a,1)], b[array_lower(b,1)]);
    $$ LANGUAGE sql;
SELECT first_elems(ARRAY[1,2,3], ARRAY['a','b','c']) AS str, 
       first_elems(ARRAY[1.1,2.2,3.3],ARRAY[true,false]) bool;

  str  |  bool   
-------+---------
 (1,a) | (1.1,t)

Procedural Languages


Transform for type bool in PL / Perl
Most recently, TRANSFORM of Ivan Panchenko (Postgres Professional) - bool_plperl was committed . Postgres passes boolean values ​​totorin PL / Perl likef, but for Perl itf’s not a boolean false , but just the letter f, i.e. in a logical context, truth . This problem can be solved in different ways (see correspondence ), but creating TRANSFORM for bool, according to Tom Lane, is the most practical.

Fast execution of simple expressions in PL / pgSQL

Simple expressions (at least not containing table calls and not requiring locks) will be faster. Previously, in these cases, time was unproductively spent on contacting the scheduler on each cycle.

CREATE OR REPLACE FUNCTION slow_pi() RETURNS double precision AS $$
    DECLARE
        a double precision := 1;
        s double precision := 1;
        r double precision := 0;
    BEGIN
        FOR i IN 1 .. 10000000 LOOP
            r := r + s/a; a := a + 2; s := -s;
        END LOOP;
        RETURN 4*r;
    END;
    $$ LANGUAGE plpgsql;

Call slow_pi () in PG12:

SELECT slow_pi();

          slow_pi      
    --------------------
     3.1415925535898497
    (1 row)
    Time: 13060,650 ms (00:13,061)

Now in PG13:
SELECT slow_pi();

          slow_pi      
    --------------------
     3.1415925535897915
    (1 row)
    Time: 2108,464 ms (00:02,108)

Trusted extensions instead of pg_pltemplate

The system directory has decreased by one table. Deletedpg_pltemplate. It stored the properties of procedural languages ​​that are needed during executionCREATE LANGUAGE. Now we decided to register the properties from in the scripts of extensions of the corresponding languagespg_pltemplate, and get rid of the table itself. But in order to realize what is planned, it is necessary to provide for the possibility for the database owner (without superuser rights) to create a trusted language from the extension script. Indeed, now to create, for example, plpgsql, the database owner does not have to be superuser.

Received as follows. A new logical parameter has appeared in the control file for extensionstrusted. If both parameterstrustedandsuperuserincluded, then the extension can be created not only by the superuser, but also by the user with the right CREATEto the current database (and therefore its owner). When executing a script of such an extension, the superuser rights that initialized the cluster will be used. The objects created by the extension will belong to it, although the owner of the extension itself will be the creating user.

Important consequences of these changes:

  • Trusted extensions open the way for third-party developers to create other trusted languages. Now we are limited only to plpgsql, plperl and pltcl.
  • It pg_pltemplatewas hard written that plpython refers to the second version of the language. Failure to pg_pltemplatedo so is a step (necessary, though not sufficient) to the transition to python 3.

Indices


Compression of B-tree
An important and long-awaited (the work began already in 2015) patch written by Anastasia Lubennikova (Postgres Professional) and Peter Geigan (Peter Geoghegan) is finally communicated by Peter. Nastya managed to talk about this at PGconf India . Postgres has learned to significantly reduce the size of B-tree indexes through deduplication, that is, savings on duplicate index keys. These indexes have been seriously redesigned so that compression is possible without loss in compatibility with previous versions of indexes. The idea of ​​deduplication is taken from a more flexible architecture of indexes like GIN (reverse indexes - Generalized Inverted Index).

In these indices more often than in B-tree, there is a situation when a key corresponds to a large number of records. In the case of word processing, for example, the same token is usually found in several documents. And it is stored in the index only once. Until recently, B-tree indexes did not know how to do this.

B-tree indexes differ from GIN indexes primarily in leaf pages. Depending on the number of records related to the same key value, options are possible: the page contains only a posting list - a list of TIDs (identifiers of indexed records), if the list is small, and if there are a lot of TIDs, then instead of a list of values ​​are stored new “tree branches” - links to other pages like the posting list or other tree branches (they are called posting tree).

Such a tree structure is similar to a B-tree, but differs in essential details: for example, lists for moving through pages of the same tree level in the GIN are unidirectional, not bidirectional. Therefore (including) good compatibility of new, deduplicated indexes with old versions is not easy to achieve. And the improvements really took more than 3 years. It was also necessary to work out the cleaning mechanism (microvacuum) and other nuances.

In performance tests, all indexes to which deduplication is applicable have shrunk by about 3 times. Compressing duplicates also helps unique indexes, eliminating the problem of index swelling at a high rate of table changes. New behavior can be connected and disconnected at the index settings level.

Full scan by GIN index is not done where it is not necessary
This patch allows in some cases to avoid a full pass through the entire GIN index. Some operations, although supported by the GIN index, are performed by a full scan of the index. Take, for example, the index for full-text column search tsvector. If the search query has the form “anything but a given word”, then the entire index will have to be read in its entirety. If, however, another condition is present in the request that does not require a full scan of the index, then the index will still be scanned completely.

With the new optimization, a more accurate condition will be used first, allowing you to get a gain from the index, and then the results will be double-checked to take into account another limitation. Compare the number of pages that were read in version 12 (Buffers):

EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
SELECT * FROM mail_messages WHERE tsv @@ to_tsquery('!vacuum') AND tsv @@ to_tsquery('analyze');

                                             QUERY PLAN                                             
------------------------------------------------------------------
Bitmap Heap Scan on mail_messages (actual rows=5864 loops=1)
   Recheck Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
   Heap Blocks: exact=5167
   Buffers: shared hit=24 read=27405
   ->  Bitmap Index Scan on mail_messages_tsv_idx (actual rows=5864 loops=1)
         Index Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
         Buffers: shared hit=24 read=22238
 Planning Time: 0.283 ms
 Execution Time: 3258.234 ms

with the number of buffers in the new version:

EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
SELECT * FROM mail_messages WHERE tsv @@ to_tsquery('!vacuum') AND tsv @@ to_tsquery('analyze');

                                             QUERY PLAN                                             
---------------------------------------------------------------------------
Bitmap Heap Scan on mail_messages (actual rows=5864 loops=1)
   Recheck Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
   Heap Blocks: exact=5156
   Buffers: shared hit=5179
   ->  Bitmap Index Scan on mail_messages_tsv_idx (actual rows=5864 loops=1)
         Index Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
         Buffers: shared hit=23
 Planning Time: 0.250 ms
 Execution Time: 8.779 ms

A similar situation can be encountered when using trigrams, and when checking for the occurrence of arrays.

Parameters of operator classes
In PostgreSQL, many index access methods are a “framework” that takes on a high-level implementation of the search algorithm, working with pages and locks, and the WAL log. And binding to specific data types and operators is performed using operator classes.

Until now, operator classes could not have parameters. For example, for a full-text search, you can use the GiST index with the operator class tsvector_ops(about the GiST operator classes here) This class of operators uses a signature tree, and the signature length was fixed (124 bytes). Now you can specify the length explicitly, which allows you to control the balance between the size of the index and efficiency (the number of hash collisions):

CREATE INDEX ON mail_messages USING gist(tsv tsvector_ops(siglen=1024));

Similar changes for starters were made for other classes of GiST operators that use a signature tree, which applies to the hstore, intarray, ltree and pg_trgm extensions.
But the main idea for which this change was conceived is the ability to pass the JSONPath expression to the GIN index so that not the entire JSON document is indexed, but only the necessary part of it. In many cases, this will radically reduce the size of indexes. But this work remains to be done.

The idea of ​​Oleg Bartunov, the implementation of Nikita Glukhov and Alexander Korotkov (all three Postgres Professional).

The operator <-> (box, point) was
added. The missing operation was added for use in kNN for GiST and SP-GiST . In PG12 when working with geometric types pointandboxyou can use the distance operator <->(point, box), and it will speed up the search with the GiST and SP-GiST indexes. But the operator symmetric to him <->(box, point)was not implemented, although he boxalready understood the distances to more complex types - polygons and circles.

CREATE TABLE points(point point);
CREATE TABLE boxes(box box);
INSERT INTO points VALUES('1,2','3,4');
INSERT INTO boxes VALUES(box(point(3,4),point(5,6)), box(point(13,14),point(15,16)));

In PG12:
SELECT p.point, b.box, b.box  <-> p.point distance FROM points AS p, boxes AS b ORDER BY distance;

:    : box <-> point

If vice versa, then everything is ok:

SELECT p.point, b.box, p.point  <-> b.box distance FROM points AS p, boxes AS b ORDER BY distance;

 point |       box       |      distance
-------+-----------------+--------------------
 (1,2) | (5,6),(3,4)     | 2.8284271247461903
 (2,1) | (5,6),(3,4)     | 3.1622776601683795
 (1,2) | (15,16),(13,14) | 16.970562748477143
 (2,1) | (15,16),(13,14) | 17.029386365926403

And in PG13:

SELECT p.point, b.box, b.box  <-> p.point distance FROM points AS p, boxes AS b ORDER BY distance;

 point |       box       |      distance
-------+-----------------+--------------------
 (1,2) | (5,6),(3,4)     | 2.8284271247461903
 (2,1) | (5,6),(3,4)     | 3.1622776601683795
 (1,2) | (15,16),(13,14) | 16.970562748477143
 (2,1) | (15,16),(13,14) | 17.029386365926403

GiST and SP-GiST indices will be accelerated in this operation.

Note that in PG13, if you ask:

SELECT count(*) FROM pg_operator WHERE oprname = '<->';
 count 
-------
    28

and if we do the same in PG12, we get 20 entries: in the 13th version, the list was replenished with as many as 8 operators.

Json


Support for the .datetime () method for jsonpath

This is one of the unsuccessful patches of a large series of JSONPath patches that PG12 did not have time to complete. Part of the JSON / SQL standard. The problem was that all the functions in the JSONPath patch series are immutable, but the date comparison takes into account the current time zone, which may change during the session.

In such cases, we allow existing immutable functions to throw an error about non-immutable comparisons. At the same time, this patch has functions with the suffix _tz that work stably in operations with timezone.

New function - jsonb_set_lax function

In general, lax is a non-strict (unlike strict) mode of operation of functions with jsonb. In this case, this function will be operational in a situation where one of the arguments that it takes is NULL. Unlike the strict version - jsonb_set () - it has an additional argument that indicates actions in the case of NULL. Options: use_json_null / raise_exception / return_target / delete_key. Options suggested by interested users.

Optimized some jsonb functions.

Optimized a lot., mainly through the efforts of Nikita Glukhov (Postgres Professional). But to analyze each point in this case is pointless: firstly, their abundance will inflate an already short article; and secondly, the changes relate to the internal device, and not every user is interested. Therefore, we will only list most of them:

  1. Optimized function JsonbExtractScalar ();
  2. Optimized operator # >>, functions jsonb_each_text (), jsonb_array_elements_text ();
  3. Recognition of type JsonbContainer in get_jsonb_path_all () is optimized;
  4. Fetching the first token from the JsonbIterator iterator is replaced by the lightweight macro JsonbContainerIsXxx ();
  5. More convenient key extraction - findJsonbKeyInObject ();
  6. Optimized storage of result findJsonbValueFromContainer () and getIthJsonbValueFromContainer ();
  7. get_jsonb_path_all(), ;
    JsonbValueAsText.

, SQL/JSON: JSON_TABLE SQL/JSON: functions. . , . . PG14. JSONPath .


pgbench


The utility for running performance tests received a series of improvements. There were statistics on the execution of tasks at the initialization phase, a more visual conclusion , the ability to see the code of built-in scripts , testing on a partitioned table of accounts.

In addition, we added a command \asetsimilar to \gset, but allowing to set variables for several requests sent at a time. The following line, sent to the server for execution, sets both variables oneand two:
SELECT 1 AS one \; SELECT 2 AS two \aset

pg_dump


pg_dump learned to unload data from third-party tables . Through the parameter, --include-foreign-datayou can specify a list of third-party servers, the data from the tables of which will be unloaded.

Use this unloading carefully. Far from the fact that the data must be uploaded to a third-party server. In addition, it is quite possible that a third-party server will not be available during recovery. Or a third-party server can only allow reading, but not writing data.

psql


A series of small patches makes psql more comfortable:

  • Improved tab completion for several teams.
  • In addition to \echosending a string to STDOUT, a new command \warnsends a string to standard error output (STDERR).
  • \d toast- . \d+ .
  • \dt+ «Persistence» (unlogged) (permanent).
  • \e , ; ( \g*). , psql .
  • . PROMPT1 PROMPT2 %x.
  • New commands for more information about access methods: \dAc, \dAf, \dAo,\dAp
  • You can\g now specify in brackets any options that are supported \pset. They will act only on the current team.


libpq


Minor changes in connection with PostgreSQL:
  • The inaccuracy in the description of the host and hostadr parameters and the resulting inconsistency in the output of the \conninfopsql utility command have been fixed.
  • If the client certificate key is stored in encrypted form, you can enter the password only in interactive mode. The new sslpassword parameter allows you to decrypt the key non-interactively.
  • Two new parameters, sslminprotocolversion and sslmaxprotocolversion, allow you to specify a restriction on the version of the SSL / TCL protocol with which connection is allowed.

reindexdb


The new --jobs parameter of the reindexdb utility sets the number of database connections in which indexes will be rebuilt at the same time.

pg_rewind


The limitations of the utility are gradually being removed, and the possibilities are increasing.
Firstly, pg_rewind can now record information for recovery (as pg_basebackup can do this), as well as start recovery and subsequent shutdown of an instance if it was not stopped through a breakpoint (this had to be done manually before).

And secondly, pg_rewind learned to work with the WAL archive .
After the utility finds the WAL divergence point between the two servers, it should build a list of all pages that need to be copied to the target cluster in order to eliminate the differences. For this, the utility requires all WAL files, starting from the found point. If the necessary WAL files are not available on the target cluster, the utility could not perform its work before.

With this patch by Alexey Kondratov (Postgres Professional) pg_rewind will be able to read the missing WAL segments from the archive of the log files using the restore_command parameter if a new -c or --restore-target-wal switch is specified.

pg_waldump


pg_waldump will decrypt the prepared transaction record .

amcheck


The amcheck extension has learned to better recognize damage in B-tree indexes.
By the way, now messages in the server log about damaged pages will differ for indexes and tables .

pageinspect


The heap_tuple_infomask_flagspageinspect extension function decrypts the values ​​of the fields infomask and infomask2returned by the function heap_page_items. Useful in investigating data corruption situations.

postgres_fdw


The superuser at the username mapping level can allow regular users to use a connection without a password:

ALTER USER MAPPING FOR  SERVER 
    OPTIONS (ADD password_required 'false');

This is done, among other things, so that sslkey and sslcert can be used as connection parameters .

adminpack


The adminpack extension has a new feature - pg_file_sync. Using it, you can do fsync for files written by the server to disk, for example, via pg_file_writeor COPY TO.

Monitoring


pg_stat_slru


In the server’s shared memory, there is not only a large buffer cache, but also a number of other, simpler caches (for example, for transaction status). They use a simple algorithm for crowding out the least frequently used pages (simple least-recently-used, or SLRU). Until now, such caches “just worked”, but there was a need to monitor them, primarily for developers of the PostgreSQL kernel to figure out if something needs to be changed in them. For this and purpose, a new view of pg_stat_slru has appeared .

pg_stat_activity


In the view, the pg_stat_activity new column is leader_id. For processes participating in parallel requests, it is populated with the number of the leading process. And leading process leader_idis a process number pid.
The following query shows which queries and which processes are currently running in parallel:

SELECT query, leader_pid, 
    array_agg(pid) filter(WHERE leader_pid != pid) AS members
  FROM pg_stat_activity
 WHERE leader_pid IS NOT NULL
 GROUP BY query, leader_pid;

There are changes to the list of waiting events. Added two new events : BackupWaitWalArchiveand RecoveryPause. And the other two were given more accurate names: RecoveryWalStream -> RecoveryRetrieveRetryInterval, RecoveryWalAll -> RecoveryWalStream.

And two more new wait events occurring on the replica : RecoveryConflictSnapshot(conflict with VACUUM, which deleted the necessary version of the rows) and RecoveryConflictTablespace(conflict related to the removal of the table space).

pg_stat_statements


Until now, the extension has pg_stat_statementstreated requests with FOR UPDATEand without a phrase as the same request. Now requests with FOR UPDATE are recorded separately .

The amount of information collected has increased. From now on, not only information on resources for executing commands is recorded, but also statistics on generated journal entries . New presentation columns: wal_bytes- volume of generated records, wal_records- number of generated records, wal_num_fpw- number of full page images (full page writes).

This was made possible thanks to the prepared infrastructure for tracking the use of WAL. Therefore, now EXPLAINwith a new option it WALwill show the volume of generated records:

CREATE TABLE t (id int);
EXPLAIN (ANALYZE, WAL, COSTS OFF, TIMING OFF, SUMMARY OFF)
    INSERT INTO t VALUES(1);

              QUERY PLAN              
--------------------------------------
 Insert on t (actual rows=0 loops=1)
   WAL:  records=1  bytes=59
   ->  Result (actual rows=1 loops=1)

Extension auto_explain, VACUUMs VERBOSE, and autovacuumalso use the created infrastructure and will output volumes of WAL.

Coming back to pg_stat_statements. If the new parameter pg_stat_statements.track_planning is enabled , then additional statistics related to the scheduler will be recorded for each operator: number of plan builds; total planning time; minimum and maximum time of one planning, as well as mean and standard deviation.

Accounting for resources allocated to the scheduler is reflected in another patch that is not related to pg_stat_statements. EXPLAINwith the option BUFFERSwill report the number of buffers used at the planning stage :

EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF) 
    SELECT * FROM pg_class;

                   QUERY PLAN                   
------------------------------------------------
 Seq Scan on pg_class (actual rows=386 loops=1)
   Buffers: shared hit=9 read=4
 Planning Time: 0.782 ms
   Buffers: shared hit=103 read=11
 Execution Time: 0.219 ms

Magazine


  • log_statement_sample_rate SQL, , log_min_duration_sample( ).
    , log_min_duration_statement , .. log_min_duration_statement , log_min_duration_sample, , log_statement_sample_rate.
    , log_transaction_sample_rate , , .
  • , ( log_min_error_statement), . log_parameter_max_length_on_error. 0, .
    log_parameter_max_length_on_error SQL, , .
    ( log_statements log_duration) , : log_parameter_max_length, , .
  • You can now write the process type ( pg_stat_activity.backend_type) to the server log . For this log_line_prefix, a special symbol is provided in the parameter %b. And if the log is written in csv ( log_destination=csvlog) format , then the column is backend_typealready included there.


Progress


New ideas pg_stat_progress_analyzeand pg_stat_progress_basebackupallow you to track the progress of the collection of statistics command ANALYZEand the backup utility pg_basebackup, respectively.

Optimization


Calculation of immutable functions in the FROM clause at the planning stage
The patch of Aleksandr Kuzmenkov and Aleksandr Parfyonov (both from Postgres Professional) helps in cases where theFROMcall contains a function call that is actually a constant. In this case, instead of making the connection, the constant value is substituted in the necessary places of the request.

Here's how this happens with an example of a query related to full-text search:

EXPLAIN (COSTS OFF)
SELECT subject, ts_rank_cd(tsv, q) AS rank
FROM mail_messages, to_tsquery('english', 'tuple') q
WHERE tsv @@ q
ORDER BY rank DESC;

                              QUERY PLAN                              
------------------------------------------------------------------
 Sort
   Sort Key: (ts_rank_cd(mail_messages.tsv, '''tuple'''::tsquery)) DESC
   ->  Bitmap Heap Scan on mail_messages
         Recheck Cond: (tsv @@ '''tuple'''::tsquery)
         ->  Bitmap Index Scan on mail_messages_tsv_idx
               Index Cond: (tsv @@ '''tuple'''::tsquery)

There is no connection, and the value 'tuple' :: tsquery is substituted in the query already at the planning stage. Version 12 had a completely different picture:

EXPLAIN (COSTS OFF)
SELECT subject, ts_rank_cd(tsv, q) AS rank                            
FROM mail_messages, to_tsquery('english', 'tuple') q
WHERE tsv @@ q                            
ORDER BY rank DESC;

                          QUERY PLAN                         
-----------------------------------------------------
 Sort
   Sort Key: (ts_rank_cd(mail_messages.tsv, q.q)) DESC
   ->  Nested Loop
         ->  Function Scan on q
         ->  Bitmap Heap Scan on mail_messages
               Recheck Cond: (tsv @@ q.q)
               ->  Bitmap Index Scan on mail_messages_tsv_idx
                     Index Cond: (tsv @@ q.q)


Incremental sorting

When sorting by many keys is needed (k1, k2, k3 ...), the scheduler can now take advantage of the knowledge that the data is already sorted by several of the first keys (for example, k1 and k2). In this case, you can not re-sort all the data again, but divide them into consecutive groups with the same values ​​k1 and k2, and “sort” by the key k3.

Thus, the entire sorting splits into several consecutive sorts of smaller size. This reduces the amount of memory needed, and also allows you to give out the first data before all sorting is complete.

For example, in the demobase on the tickets table there is an index on the ticket_id column. Data received from the index will be sorted by ticket_id, so the following query will use incremental sorting:

EXPLAIN (analyze, costs off, timing off)
SELECT * FROM tickets ORDER BY ticket_no, passenger_id;

                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Incremental Sort (actual rows=2949857 loops=1)
   Sort Key: ticket_no, passenger_id
   Presorted Key: ticket_no
   Full-sort Groups: 92184 Sort Method: quicksort Memory: avg=31kB peak=31kB
   ->  Index Scan using tickets_pkey on tickets (actual rows=2949857 loops=1)
 Planning Time: 2.137 ms
 Execution Time: 2230.019 ms

Incremental sorting functionality can be disabled with the enable_incrementalsort parameter. In this case, the sorting will take noticeably longer:

SET enable_incrementalsort = off;
EXPLAIN (analyze, costs off, timing off)
SELECT * FROM tickets ORDER BY ticket_no, passenger_id;

                              QUERY PLAN                               
-----------------------------------------------------------------------
 Gather Merge (actual rows=2949857 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort (actual rows=983286 loops=3)
         Sort Key: ticket_no, passenger_id
         Sort Method: external merge  Disk: 107024kB
         Worker 0:  Sort Method: external merge  Disk: 116744kB
         Worker 1:  Sort Method: external merge  Disk: 107200kB
         ->  Parallel Seq Scan on tickets (actual rows=983286 loops=3)
 Planning Time: 3.994 ms
 Execution Time: 12291.844 ms

The idea of ​​incremental sorting was proposed back in 2013 by Alexander Korotkov (Postgres Professional), and now, seven years later, the patch was brought by James Coleman to a state accepted by the community.

TRUNCATE acceleration
When TRUNCATEscanning occurs shared_buffersto remove table buffers from shared memory. Previously, scanning was performed three times, for each table layer: MAIN (main data layer), FSM (free space map), VM (visibility map). Now the logic has changed, instead of triple operation, buffers are scanned only once. With large values, shared_buffersthis gives a tangible gain.

Partial decompression TOAST
When there is no need to read completely TOAST, limiting it to a slice at the beginning or close to the beginning, then it does not make sense to unclench it completely. Compressed TOAST is read in iterations: read a piece, if there is no necessary data, then expand it and read on. Suggested by a Google Summer of Code student, Binguo Bao, who gives an example:

CREATE TABLE slicingtest (
     id serial PRIMARY KEY,
     a text
);
INSERT INTO slicingtest (a) SELECT
    repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) AS a FROM
    generate_series(1,100);
\timing
SELECT sum(length(substr(a, 0, 20))) FROM slicingtest;

Time: 28.123 ms

With the patch, an order of magnitude faster:

Time: 2.306 ms

Parallel VACUUM
In his article on this subject, Yegor Rogov explains in detail this important step in parallelization. In short: “Patch Masahiko Sawada, which allows you to perform cleaning in parallel. The table itself is still cleared by one (leading) process, but to clean indexes, it can now start background workflows, one for each index. In manual mode, this allows you to speed up the cleaning of large tables with multiple indexes; automatic cleaning does not use this feature yet. ”

Auto clean when pasted into a table
For this patch (also known as the Berserk auto-vacuum), we need to thank Dorofei Proleskovsky, who proposed a solution to the following problem: auto-cleaning does not come to the append-only table, because they do not have “dead” versions of the rows. Because of this, the visibility map is not updated, making only index-only scans ineffective, and when cleaning does come to prevent transaction counter overflows, it needs to do a lot of work at once. Now this situation has been fixed: auto-cleaning will also work on adding lines. Two new server parameters ( autovacuum_vacuum_insert_thresholdand autovacuum_vacuum_insert_scale_factor) appeared, similar to those for modifications ( autovacuum_vacuum_thresholdand autovacuum_vacuum_scale_factor).

Hash Aggregate Memory Management
Hash aggregation may require more RAM than the scheduler thought and than indicated in work_mem. Previously, such a scheduler error led to the fact that the size was work_memignored and memory was allocated as much as needed for the operation or arrival of OOM Killer. Now the algorithm may not go beyond work_mem, and if necessary, use temporary files on disk. To control the behavior of the scheduler, the following parameters appeared: enable_groupingsets_hash_diskand enable_hashagg_disk.

Optimizing UPDATE for tables with generated columns
In version 12, the generated columns were recalculated during any row update, even if this change did not affect them in any way. Now they will be recalculated only when it is really needed (if their base columns have changed).

This optimization, for example, can significantly speed up the updating of tables with a generated type column tsvector, since the function is to_tsvector()quite expensive.

Access from the trigger to the list of changed columns
A small patch that adds a TriggerDatabitmap of changed columns to the structure . This information can be used by general trigger functions, such as tsvector_update_trigger()or lo_manage(), in order not to do unnecessary work.

Use of several advanced statistics when evaluating
In PG12, the scheduler was not able to use several advanced statistics for the same table at the same time. For example, imagine a situation where two advanced statistics are constructed for different sets of columns, and columns from one set and from another participate in the query. Now the planner has access to all available information.

Infrastructure for parallelization and COPY (see also this patch. )
PostgreSQL concurrency still works for read-only queries. There are difficulties with writers, and one of them is blocking processes that simultaneously perform one task (included in a common parallel group). It is believed that the locks of such processes do not conflict - for example, several processes can hold an exclusive lock on the same table. This requires special care from the kernel developers, but otherwise they would constantly have deadlocks.
But there are two exceptions:

  • relation extension lock, which is captured when new pages are added to the end of the data file, and
  • page lock, which is used when moving GIN index items from the wait list to the main tree.

(You can read more in this article. )
Such locks should conflict even between processes of the same parallel group - which implements this patch. But these locks can never lead to deadlocks, so they are excluded from the scan.

For the user, in general, nothing changes, but this patch is important because, firstly, it paves the way for parallel INSERT and COPY, and secondly, it eliminates one of the PostgreSQL bottlenecks under high load conditions (which can be heard in the report HL ++ ).

Safety


SKH PRH
primes replaced Replaced EDH primes (Diffie-Hellman Ephemeral Keys) using the now defunct SKIP protocol.

initdb: the default settings for authentication
have changed The default access settings for local and network connections have changed when initdb starts. Now inpg_hba.conffor local connections instead of the authentication methodtrustwill bepeer(or md5 if peer is not supported), andmd5for network connections. Initially, more liberal measures were discussed: warning in the documentation. Then tougher:scram-sha-256. As a result, we decided to limit ourselves topeerandmd5.

Using explicit_bzero
Important patch. The bzero () and explicit_bzero () OS functions write bytes containing into the indicated memory areas '\0'(see, for example , Linux). These patches are just the beginning: there are many sections of memory in which passwords and other sensitive information can remain. We decided to start from places like libpq, in which the whole file with passwords can remain in memory after reading .pgpass, and from cleaning after closing the connection. In be-secure-common.c now there is an overwriting of the entered secret phrase in SSL, which appears in the line (path) of the error.

Added "password_protocol" parameter to libpq
This patch allows libpq to control which password transfer protocol is used during the connection. After receiving this parameter, libpq will refuse authentication if the protocol is weaker than the specified one. By default, this parameter plaintext, that is, all protocols are suitable.

Mandatory access for TRUNCATE
This patch enables extensions to embed Mandatory Access Control (MAC) for a TRUNCATE operation. Rights to it will now be checked by the sepgsql extension . SELinux Reference Policy and Redhat-based Linux distributions do not support SELinux checking on db_table {truncate}. In this case, sepgsql will be used with 'deny_unknown' equal to 1, and TRUNCATE will fail.

Availability of GUC ssl_passphrase_command
A simple but useful patch. Now the value of the ssl_passphrase_command parameter will be seen only by superuser. The parameter specifies an external command that is called when a password is required to decrypt an SSL file, for example, a private key.

Localization


Versioning libc collation rules
For ICU collation rules, version numbers are stored in the database. Each time the rule is used (sorting, character comparison), the saved version number is checked with the current version in the ICU library in the OS, and in case of discrepancies, a warning is issued. This allows you to find that certain indexes built according to the modified sorting rules may be incorrect and should be rebuilt. By rebuilding the indexes with the commandALTER COLLATION ... REFRESH VERSION, the version of the sorting rule in the database is updated and warnings are no longer issued.

But that was only for the ICU. Now the version number is also stored for libc sorting rules:

SELECT collname, collversion FROM pg_collation WHERE collname = 'ru_RU';

 collname | collversion
----------+-------------
 ru_RU    | 2.27

Which makes it possible to issue warnings when a library changes in the OS. Very relevant in light of the transition to glibc 2.28, where many sorting rules have changed, and the corresponding indexes should be rebuilt.

But until they switched to 2.28, everything is calm:
ALTER COLLATION "ru_RU" REFRESH VERSION;

NOTICE:  version has not changed
ALTER COLLATION

Full Text Search


Fulltext search for Greek language
No comments.

dict_int learned to handle absolute values.
The dict_int template dictionary (aka extension) added the ability to remove the sign from a number.


CREATE EXTENSION dict_int;
SELECT ts_lexize('intdict', '-123');

ts_lexize
-----------
 {-123}
(1 row)

ALTER TEXT SEARCH DICTIONARY intdict (ABSVAL = true);
SELECT ts_lexize('intdict', '-123');

ts_lexize
-----------
 {123}
(1 row)

That is, this time the absolute value was recognized.

Partitioning


BEFORE
row triggersBEFOREon a partitioned tableIn version 12, you cannot create row triggerson a partitioned table. On separate sections - please, but not on the entire table at once. Now, aBEFORE FOR EACH ROWtrigger created on a partitioned table will be automatically inherited and work for all sections. But with the condition that if it is a trigger onUPDATE, then the partition key in it can only be changed within the current section.

Support for partitioned tables in logical replication
Previously, including a partitioned table in a publication caused an error:

CREATE PUBLICATION pub FOR TABLE p;

ERROR:  "p" is a partitioned table
DETAIL:  Adding partitioned tables to publications is not supported.
HINT:  You can add the table partitions individually.

Now it works.

Improved sectional JOIN algorithm
Starting from the 11th version, the scheduler can join partitioned tables section by section, but only if the boundaries of the sections exactly match. Now the algorithm has been improved: it will work if the section of one table is completely included in the section of another, even if their sizes do not match (for example, if one table is partitioned by day and the other by month). The new algorithm works for partitioning by ranges and by lists.

Sectional FULL OUTER JOIN Sectionalization join
now works for full outer joins recorded with a phrase USING.

tableam


In this attractive and promising, but difficult area, there are no radical advances regarding PostgreSQL 12. There are no ready-made plug-in storages such as zheapother than heap), but work continues on the API.

A higher level of abstraction in determining the size of the table
Robert Haas rewrote the code, changing its architecture in favor of abstract layers, so as not to duplicate the code in the future. This piece refers to estimate_rel_size- the size of the layers (forks) of the table.

You can use table access methods with relcache.
This patch brings the memory management capabilities of table access methods to the capabilities of index methods.

tableam and TOAST
TOAST are largely designed for storageheapTherefore, when creating new table access methods, you can go in two ways: help developers of new methods integrate insert, update and delete TOAST records into them or delegate work with TOAST to code using the traditional PostgreSQL storage - heap. A series of 5 patches uses tuple slots to implement insert / update / delete operations and can help those that go both ways.

fsync


Handling fsync errors in pg_receivewal and pg_recvlogical
The fight against fsync () continues. PostgreSQL believes that a successful fsync () call means that all data in the file has been flushed to disk, but this does not always happen (OS dependent) and may result in data loss. PG13 decided that it was necessary to deal with the utilitiespg_receivewalandpg_recvlogical. Currently, the default behavior is this: these utilities will write fsync errors to the logs, restore the connection and continue as if nothing had happened. As a result, the WAL contains information about files that were successfully copied, which, in fact, were not correctly flushed to disk. So it’s better to interrupt the utility. The fate of pg_dump, pg_basebackup, pg_rewind and pg_checksums was also discussed, but so far have limited themselves to these two.

Protection against setting incorrect flags for fsync ()
This patch checks if the flags are set correctly when receiving the file descriptor for fsync () - directories are open only for reading, and files for writing or both.

Backup and replication


Pause during recovery before reaching the recovery point
If during the recovery the WALs have ended, butrecovery_target_timethey have not reached thespecifiedone, the server completes the recovery and switches to normal operation mode. Now it will not be so. The recovery process will pause, as reported in the log, and the administrator will have the opportunity to insert the missing WAL segments and continue the recovery.

The ignore_invalid_pages parameter
When the recovery process on a replica finds a link to an invalid page in the WAL record,panic-ahappens. The inclusion of the parameter will help to overcome it.ignore_invalid_pages. Recovery will continue with possible loss of integrity, data and other most serious consequences. The parameter is intended for server developers and should be used in those cases when you still need to try to complete the recovery and start the replica.

Changing primary_conninfo without restarting
Sergey Kornilov's patch, which allows you to change the settings primary_conninfo, primary_slot_nameand wal_receiver_create_temp_slotwithout restarting the server. Actually, for the sake of this, they abandoned the file recovery.confin the 12th release. Pg_basebackup

backup
manifests now creates a “manifest” - a JSON file containing information about the backup made (file names and sizes, necessary WAL files, as well as checksums for everything and everything).
The new pg_validatebackup utility checks the backups for compliance with the manifest, and also checks for the availability and correctness of the WAL files necessary for recovery using the pg_waldump utility (this applies only to WAL files inside the backup itself, and not in the archive).
This will allow you to detect situations where the backup files were damaged or disappeared, or when recovery became impossible due to the lack of the necessary log files.

Limiting unread data replication slot The replication
slot is a convenient but dangerous mechanism: if the client does not read the data from the slot on time, unread WAL records can take up all the space on the server. Now using the parametermax_slot_wal_keep_sizeYou can set a limit on the maximum amount of disk space that can be occupied by unread data. If at the next checkpoint it turns out that the size is exceeded, the slot is disabled, and the place is freed.

Windows


Support for Unix sockets on Windows Unix-
domain sockets are supported on Windows 10, although they are disabled by default.

Documentation


There are two new applications in the documentation.
After a long discussion , Appendix M. Glossary appeared . There are currently 101 terms in the glossary.

The ability to highlight the color of diagnostic messages of console utilities using a variable PG_COLORwas earlier. This is now documented in Appendix N. Color Support . Peter Eisentrout's original intent in this patch was to make the colorized output turned on by default. And for those who did not want this, it was proposed to explicitly set the variableNO_COLOR. But there were more opponents of color differentiation of messages among those discussing the patch. Therefore, they decided only to document the available opportunities. And we got a new section of the first level in the documentation.



PG13, , PG14 . , . .

All Articles