PostgreSQL: Server-side programming in the human language (PL / Perl, PL / Python, PL / v8)

Postgres is known for its extensibility, which also applies to support for procedural languages ​​(PL). No one can boast a language with a list of languages ​​of this length, and potentially this list is not at all limited: in order to connect the language to the server, no extra effort is required. You can even come up with your own language and make it a server procedural language. Alterations in the DBMS will not require this. Like so much more, this extensibility has been built into the Postgres architecture from the very beginning.

It is possible and sometimes necessary to write PL languages ​​for tasks. Better yet, if someone writes such a framework for writing languages ​​so that you can write not in C, but to choose a language that is more comfortable for a language developer. As with FDW, which can be written in Python .

This article was written on the basis of a number of reports and master classes on this topic made by the author at the conferences PgConf.Russia 2019 , PgConf.Russia 2018 and DevConf 2017 .

This is not about exoticism, but about the most common procedural languages ​​PL / Perl, PL / Python and PL / V8 (i.e. JavaScript) and comparing their capabilities with PL / pgSQL.

When are these languages ​​worth using? When is SQL and PL / pgSQL missing?

  • Then, when you need to work with complex structures, with algorithms: traversing trees, for example, or when HTML or XML parsing is required, especially when extracting them from archives;
  • When you need to dynamically generate complex SQL (reports, ORM). On PL / pgSQL, it’s not only inconvenient, but will also work slower in some cases;
  • Perl Python, C/C++, Perl Python . . , Oracle. , Postgres . Perl Python .
  • — . , , untrusted- ( — . ), Perlu Python(3)u, PL/V8. Postgres , , FDW, , . . !
  • And one more thing: if you are going to write something in C, then you can make a prototype in these languages ​​that are more adapted to rapid development.

How to embed a language in Postgres


To implement the language you need: write in C from one to three functions:

  • HANDLER - a call handler that will execute a function in the language (this is a required part);
  • INLINE - anonymous block handler (if you want the language to support anonymous blocks);
  • VALIDATOR - code verification function when creating a function (if you want this verification to be done).

This is described in detail in the documentation here and here .

“Languages ​​out of the box” and other languages


There are only four languages ​​that are supported “out of the box”: PL / pgSQL , PL / Perl , PL / Python and PL / Tcl , but the tickle is rather a tribute to history: few people use it now, we won’t talk about it anymore.
PL / Perl, PL / Python and, of course, PL / pgSQL are supported by the Postgres community. Support for other non-box languages ​​falls on their maintainers - companies, communities, or specific developers interested in making the language work inside the DBMS. PL / V8 promotes Google. But from time to time there are reasonsdoubt the cloudless future of PL / V8. Google’s current PL / V8 project maintainer, Jerry Sievert, is considering postgres server-based JS support based on a different engine (such as QuickJS), as PL / V8 is difficult to build and requires 3-5 GB all sorts of stuff on Linux when building, and this often leads to problems on different OSs. But the PL / V8 is widely used and thoroughly tested. It is possible that PL / JS will appear as an alternative to another JS engine, or for now just as a name, which we will get used to during the transition period.

PL / Java is rarely used. I personally did not have a need to write in PL / Java because in PL / Perl and in PL / V8 there is enough functionality for almost all tasks. Even Python does not particularly add features. PL / RUseful for those who like statistics and love this language. We will not talk about him here either.

Popular languages ​​are not necessarily popular with writing storages: there is PL / PHP, but now it is practically not supported by anyone - there are few who want to write server procedures on it. For the PL / Ruby language, the picture is somehow the same, although the language seems to be more modern.

A Go-based procedural language is being developed, see PL / Go , and also, it seems, PL / Lua . It will be necessary to study them. For stubborn fans of the shell, there is even PL / Sh , it's hard to even imagine what it might be for.

There is at least one domain-specific procedural language (DSL) that is narrowly specialized for its task - PL / Proxy, which used to be very popular for proxying and balancing server load.

In this article, we will cover the main, most commonly used languages. This, of course, is PL / PgSQL, PL / Perl, PL / Python and PL / V8, we will call them PL / * below .

Languages ​​“out of the box” are really almost literally installed out of the box - usually the installation is painless. But to install PL / V8, if you did not find a package with the necessary version in the repository of your OS, this is almost a feat, because for this you will have to actually build the entire V8, or, in other words, Chromium. At the same time, the entire development infrastructure will be downloaded from google.com along with V8 itself - count on a couple of gigabytes of traffic. For Postgres 11 under Ubuntu, the PL / V8 package has not yet appeared, only V8 for PG 10 is available in the repository so far. If you want, assemble it by hand. It is also important that the version that you will find in the repository is most likely quite old. At the time of publication of the article, the latest version is 2.3.14.

After the language itself is installed, you must also “create” the language - register it in the system directory. This should be done by the team.

CREATE EXTENSION plperl;

(instead of plperl, you can substitute the name of another language, there are certain nuances, see below).
We look at what happened:

test_langs=# \x
test_langs=# \dL+
List of languages
-[ RECORD 1 ]-----+---------------------------------
Name              | plperl
Owner             | postgres
Trusted           | t
Internal language | f
Call handler      | plperl_call_handler()
Validator         | plperl_validator(oid)
Inline handler    | plperl_inline_handler(internal)
Access privileges |
Description       | PL/Perl procedural language
-[ RECORD 2 ]-----+---------------------------------
Name              | plpgsql
Owner             | postgres
Trusted           | t
Internal language | f
Call handler      | plpgsql_call_handler()
Validator         | plpgsql_validator(oid)
Inline handler    | plpgsql_inline_handler(internal)
Access privileges |
Description       | PL/pgSQL procedural language
[ RECORD 3 ]-----+---------------------------------
Name              | plv8
Owner             | postgres
Trusted           | t
Internal language | f
Call handler      | plv8_call_handler()
Validator         | plv8_call_validator(oid)
Inline handler    | plv8_inline_handler(internal)
Access privileges |
Description       |

PL / pgSQL does not need to be specially created; it is always already in the database.

Attention! PL / pgSQL should not be confused with SQL. This is a different language. However, Postgres can also write functions in plain SQL.

Standards


In the world of DBMS, they often talk about compliance with SQL standards. Procedural languages ​​also have standards, although they are not talked about so often. The SQL / PSM standard is highly compliant with the DB2 procedural language. Its implementation is far from PL / pgSQL, although conceptually they are close.

SQL / JRT is the standard for Java procedures, and PL / Java is a good match.

Trusted and Untrusted Languages


Postgres procedural languages ​​are trusted (TRUSTED) and untrusted (UNTRUSTED).
In TRUSTED languages, there is no possibility of direct work with I / O, including the network, and indeed with system resources. Therefore, such functions can be created by any database user, spoil something and he will not be able to learn too much. Functions in UNTRUSTED languages ​​can only be created by a supervisor.

If the language interpreter supports such restrictions, then it can be used to create both TRUSTED and UNTRUSTED languages. So with Perl, so there are different languages plperland plperlu. Letter uat the end it gives out the untrusted character of the language. Python exists only in an untrusted version. PL / v8 - on the contrary, only in trusted. As a result, PL / v8 cannot load any modules or libraries from disk, only from the database.

A function in the UNTRUSTED language can do anything: send an email, ping a site, log into a foreign database, and execute an HTTP request. TRUSTED languages ​​are limited to processing data from the database.

By TRUSTED include: plpgsql, plperl, plv8, pljava.

By UNTRUSTED include: plperlu, pljavau, plpython2u, plpython3u.

Please note: there is no PL / Python like TRUSTED (since you cannot set restrictions on access to resources there), and PLpgSQL and PL / V8 are the other way around: they are not UNTRUSTED.

But Perl and Java are available in both versions.

PL / pgSQL vs PL / *


PL / pgSQL code natively works with all the data types that Postgres has. Other languages ​​do not have many Postgres types, and the language interpreter takes care of converting the data into an internal representation of the language, replacing obscure types with text. However, he can be helped with the help of TRANSFORM, which I will talk about closer to the end of the article.

Function calls in PL / pgSQL are often more expensive. Functions in other languages ​​can access their libraries without looking at the system catalog. PL / pgSQL cannot work like that. Some queries in PL / pgSQL work for a long time because of the fact that a lot of types are supported: to add two integers, the interpreter needs to realize that he is dealing with integers and not some other exotic types, then decide how to fold them, and only after that actually fold them.

Since PL / pgSQL is TRUSTED, you cannot work with network and disks from it.

When it comes to working with nested data structures, PL / pgSQL has only Postgres tools for working with JSON, which are very cumbersome and unproductive, in other languages, working with nested structures is much simpler and more economical.

PL / * has its own memory management, and you need to monitor the memory, or maybe limit it.

You should carefully monitor the error handling, which is also different for everyone.

But in PL / * there is a global interpreter context, and it can be used, for example, for caching data, including query plans. If the language is UNTRUSTED, then the network and drive (s) are available. All these languages ​​work with the database, as a rule, through the SPI, but more on that later.

Let's take a closer look at the features of the PL / * languages.

PL / Perl


The Perl interpreter is a hefty piece of code in memory, but fortunately, it is not created when the connection is opened, but only when the first stored procedure / function PL / Perl is launched. When it is initialized, the code specified in the Postgres configuration parameters is executed. Usually, modules are loaded and precomputations are made. If you added to the configuration file while the database is running, make Postgres re-read the configuration. In this article, the examples use a module to visualize data structures. There are parameters for separate initialization of TRUSTED and UNTRUSTED Perl and, of course, a parameter . Those who program in Perl know that without it is not a language, but one misunderstanding.

plperl.on_init= 'use Data::Dumper;'
plperl.on_plperl_init= ' ... '
plperl.on_plperlu_init= ' ... '
plperl.use_strict= on


Data::Dumper

use_strict=onstrict

PL / Python


In it, the interpreter is created in the same way the first time it is accessed. And here it is important to immediately decide which python you want: second or third. As you know, Python exists in two popular versions (Python 2 and Python 3), but the problem is that their so-shki do not get along together in one process: there is a conflict by name. If you worked with v2 in one session and then called v3, then Postgres will crash, and for the server process (backend) this will be a fatal error. To access a different version, you need to open another session.

Unlike Perl, python cannot be told what to do during initialization. Another inconvenience: single-liners are inconvenient to do.

In all Python functions, two dictionaries are defined - static SDand global GD. Global allowsexchange data with all functions within one backend - which is attractive and dangerous at the same time. Each function has a static dictionary.

In PL / Python, you can make subtransactions, which we will discuss below.

PL / V8


It is only TRUSTED.

Conveniently, JSON data is automatically converted to a JS structure. In PL / V8, as in PL / Python, you can do subtransactions. There is an interface for simplified function calls. This is the only procedural language in question in which window functions can be defined . They suggest that they can be defined on PL / R , but this language is outside the scope of this article.

And only in PL / V8 is there a execution timeout. True, it is not turned on by default, and if you build PL / V8 by hand, you need to say that it was turned on during assembly, and then you can set timeouts for function calls with the configuration parameter.

Initialization in PL / V8 looks interesting: since it is trusted, it cannot read the library from disk, it cannot load anything from anywhere. He can take everything he needs only from the base. Therefore, a stored initializer function is defined that is called when the language interpreter starts. The function name is specified in a special configuration parameter:

plv8.start_proc=my_init # ( PL/V8-)

During initialization, global variables and functions can be created by assigning their values ​​to the attributes of the this variable. For example, like this:

CREATE OR REPLACE FUNCTION my_init()
RETURNS void LANGUAGE plv8 AS $$
     this.get_57 = function() { return 57; }; //   
     this.pi_square = 9.8696044;  //   
$$;
SET plv8.start_proc = 'my_init';
DO LANGUAGE plv8 $$
     plv8.elog(NOTICE, pi_square, get_57() );
$$;

Comparison of PL / Perl vs PL / Python vs PL / V8 in practice


Hello World!


Let's perform a simple exercise with the output of this phrase in all three languages, first in PL / Perl . And let him do something else useful, for example, tells his version:

DO $$
     elog(NOTICE,"Hello World! $]");
$$ LANGUAGE plperl;

NOTICE:  Hello World!
DO

You can also use the usual Perl functions warnand die.

Now in PL / Python . More precisely on PL / Python3u (untrusted) - for definiteness.

DO $$
     import sys
     plpy.notice('Hello World! ' , hint=" ", detail=sys.version_info)
$$ LANGUAGE plpython3u;


NOTICE:  Hello World! 
DETAIL:  sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)
HINT:   
DO

Can use throw 'Errmsg'. There are many things you can extract from Postgres messages: they contain Hint, Details, line number, and many other parameters. In PL / Python, they can be passed, but not in the other languages ​​under consideration: their means can only be cursed with a plain text line.

In PL / Python, each postgres logging level has its own function: NOTICE, WARNING, DEBUG, LOG, INFO, FATAL. If it is ERROR, then the transaction has fallen, if FATAL, the entire backend has fallen. Fortunately, the matter did not reach PANIC. You can read here .

PL / V8

In this language, Hello world is very similar to pearl. You can quit exceptionusing throw, and this will also be error handling, although the tools are not as advanced as in Python. If you writeplv8.elog(ERROR), the effect will be, by the way, the same.

DO $$
     plv8.elog(NOTICE, 'Hello World!', plv8.version);
$$ LANGUAGE plv8;

NOTICE:  Hello World! 2.3.14
DO

Work with the base


Now let's see how to work with a database from stored procedures. Postgres has an SPI (Server Programming Interface). This is a set of C functions that is available to all extension authors. Almost all PL languages ​​provide wrappers for SPI, but each language does it a bit differently.

A function written in C but using SPI is unlikely to give a significant gain compared to PL / PgSQL and other procedural languages. But a C function that bypasses SPI and works with data without intermediaries (for example table_beginscan/heap_getnext) will work an order of magnitude faster.

PL / Java also uses SPI. But working with the database still happens in the style of JDBC and the JDBC standard. For the creator of code in PL / Java, everything happens as if you were working from a client application, but JNI (Java Native Interface) translates calls to the database into the same SPI functions. It is convenient, and there are no fundamental obstacles to translate this principle into PL / Perl and PL / Python, but for some reason this has not been done, and it is not yet visible in the plans.

Of course, if you wish, you can go to foreign bases in the usual way - through DBI or Psycopg . It is possible to local database, but why.

If you don’t get into the holistic topic “process in the base vs process on the client”, and immediately proceed from the maximum processing closer to the data (at least in order not to drive giant samples over the network), then the solution to use the functions stored on the server looks naturally.

Performance : keep in mind that SPI has some overhead, and SQL queries in functions may be slower than without functions. The 13th postgres included a patch by Konstantin Knizhnik , which reduces these costs. But, of course, the processing of query results in a stored function does not require the transfer of the result to the client, and therefore can be beneficial in terms of performance.

Safety: a set of debugged and tested functions isolates the database structure from the user, protects against SQL injections and other mischief. Otherwise, it will remain a headache for every application developer.

Code reuse: if a large number of intricate applications work with the database, it is convenient to store useful functions on the server, rather than writing them again in each application.

How and in what form do we get data from the database


In Perl , everything is simple and clear. The call spi_exec_queryreturns the number of rows processed, the status and the array of rows selected by the SQL query:

DO $$ 
     warn Data::Dumper::Dumper(
          spi_exec_query('SELECT 57 AS x')
     )
$$ LANGUAGE plperl;

WARNING:  $VAR1 = {
          'rows' => [
                    {
                      'x' => '57'
                    }
                  ],
          'processed' => 1,
          'status' => 'SPI_OK_SELECT'
        };

In Python, the query and the result look something like this, but here the function does not return a data structure, but a special object that you can work with in different ways. Usually it pretends to be an array and, accordingly, you can extract strings from it.

DO $$ 
     plpy.notice(
          plpy.execute('SELECT 57 AS x')
     )
$$ LANGUAGE plpython3u;

NOTICE:  <PLyResult status=5 nrows=1 rows=[{'x': 57}]>
DO

And now we take the 1st line, get out of there X and get the value - the number.

DO $$ 
     plpy.notice(
          plpy.execute('SELECT 57 AS x')[0]['x']
      )
$$ LANGUAGE plpython3u;

NOTICE:  57
DO

In PL / V8 :

DO $$ 
     plv8.elog(NOTICE, JSON.stringify(
          plv8.execute('SELECT 57 as x'))
     );
$$ LANGUAGE plv8;

NOTICE:  [{"x":57}]
DO

In order to see the structure, we used the library function JSON.stringify, which does not need to be loaded specifically, it is already ready for use as part of PL / v8 by default.

Shielding


To avoid malicious SQL injections, some characters in queries must be escaped. To do this, firstly, there are SPI functions and corresponding functions (written in C) in languages ​​that work like SPI wrappers. For example, in PL / Perl:

quote_literal- takes in apostrophes and doubles' and \. Designed for screening text data.
quote_nullable- same, but undefconverted to NULL.
quote_ident- quotes the name of the table or field, if necessary. Useful in the case when you are constructing an SQL query and substituting the names of the database objects in it.

PL / Perl

DO $$
     warn "macy's";
     warn quote_literal("macy's");
$$ LANGUAGE plperl;

WARNING:  macy's at line 2.
WARNING:  'macy''s' at line 3.
DO

Keep in mind: the table name must not be escaped like a text line. That is why there is a function quote_ident.

But in PL / Perl there are other functions for shielding data of individual post-gres types: A function should accept any type and turn atypical doubtful characters into something obviously safe. It works with a huge number of types, but, nevertheless, not with all. She, for example, will not understand range types and perceive them simply as text strings.

encode_bytea
decode_bytea
encode_array_literal
encode_typed_literal
encode_array_constructor


quote_typed_literal

DO $$
     warn encode_typed_literal(
          ["", " "], "text[]"
     );
$$ LANGUAGE plperl;

WARNING:  {," "} at line 2.
DO

There are three similar functions in PL / Python , and they work in much the same way:

plpy.quote_literal
plpy.quote_nullable
plpy.quote_ident


DO $$ plpy.notice(
     plpy.quote_literal("Macy's"));
$$ LANGUAGE plpython3u;
NOTICE:  'Macy''s'
DO

Are the functions in PL / V8 the same ?

Of course! Everything is the same up to syntactic features.

plv8.quote_literal
plv8.quote_nullable
plv8.quote_ident


DO $$
    plv8.elog(NOTICE, plv8.quote_nullable("Macy's"));
$$ LANGUAGE plv8;

NOTICE:  'Macy''s'

Performance


Which language is the fastest? Usually they answer: C. But the correct answer is C or SQL. Why SQL? The fact is that a function in this language is not always performed explicitly. It can become embedded in the request (the scheduler will embed the function in the body of the main request), optimize well with the request, and the result will be faster. But under what conditions can code be embedded in a request? There are a few simple conditions that you can read about, say, here . For example, a function should not be executed with the rights of the owner (to be SECURITY DEFINER). Most simple functions will fit these conditions.

In this article we will measure "on the knee", not seriously. We need a rough comparison. First turn on the timing:

\timing

Let's try SQL (The execution times of the commands below are the rounded average values ​​that the author received on an unloaded six-year-old PC. They can be compared with each other, but they do not claim to be scientific):

SELECT count(*) FROM pg_class;
0.5 ms

It works very fast. In other languages, time is wasted calling functions from the language. Of course, the first time the request will run slower due to the initialization of the interpreter. Then it stabilizes.

Let's try PL / pgSQL :

DO $$
     DECLARE a int;
     BEGIN
          SELECT count(*) INTO a FROM pg_class;
     END;
$$ LANGUAGE plpgsql;
0.7 ms

PL / Perl :

DO $$
     my $x = spi_exec_query('SELECT count(*) FROM pg_class');
$$ LANGUAGE plperl;
0.7 ms

PL / Python:

DO $$
     x = plpy.execute('SELECT count(*) FROM pg_class');
$$ LANGUAGE plpythonu;
0.8 ms

It was Python 2. Now Python 3 (recall: Python2 and Python3 do not live peacefully within the same session, a conflict of names is possible):

DO $$
     x = plpy.execute('SELECT count(*) FROM pg_class');
$$ LANGUAGE plpython3u;
0.9ms

And finally, PL / V8 :

DO $$
     var x = plv8.execute('SELECT count(*) FROM pg_class');
$$ LANGUAGE plv8 ;
0.9 ms

But it is somehow very fast. Let's try to execute the query 1000 times or 1 million times, suddenly the difference will be more noticeable:

PL / pgSQL :

DO $$
     DECLARE a int; i int;
     BEGIN FOR i IN 0..999999 LOOP
          SELECT count(*) INTO a FROM pg_class;
    END LOOP;
END;
$$ LANGUAGE plpgsql;
53s

PL / Perl :

DO $$
     for (0..999999) {
          spi_exec_query('SELECT count(*) FROM pg_class');
     }
$$ LANGUAGE plperl;
102s

PL / Python 3 :

DO $$
     for i in range (0,1000000) :
          plpy.execute('SELECT count(*) FROM pg_class')
$$ LANGUAGE plpython3u;
98s

PL / V8 :

DO $$
     for(var i=0;i<1000;i++)
          plv8.execute('SELECT count(*) FROM pg_class');
$$ LANGUAGE plv8;
100ms

Note that with PL / V8, the experiment was done with a thousand, not a million iterations. With moderate resources, the PL / V8 in a cycle of 1 million operations will eat all the memory and completely hang the car. Already at a thousand iterations, the postgres process selects 3.5GB of memory and 100% write to disk. In fact, postgres launches the V8 environment, and of course it eats memory. After executing the request, this turbo monster is not going to give back memory. To free memory, you have to end the session.

We see that PL / pgSQL is already 2 times faster than PL / Perl and PL / Python. PL / V8 is still slightly behind them, but towards the end of the article he is partially rehabilitated.

In general, Perl with Python in these experiments show approximately the same results. Perl used to be slightly inferior to Python; in modern versions, it is slightly faster. The third python is slightly slower than the second. The whole difference is within 15%.

Performance with PREPARE


People who know will understand: something is wrong. PL / pgSQL can automatically cache query plans , and in PL / *, each time the query was scheduled again. In a good way, you need to prepare requests, build a request plan, and then according to this plan they should be executed as many times as needed. In PL / *, you can explicitly work with query plans, which we will try starting with PL / Perl :

DO $$
     my $h = spi_prepare('SELECT count(*) FROM pg_class');
     for (0..999999) {
          spi_exec_prepared($h);
     }
     spi_freeplan($h);
$$ LANGUAGE plperl;
60s

PL / Python 3 :

DO $$
     h = plpy.prepare('SELECT count(*) FROM pg_class')
     for i in range (0,1000000): plpy.execute(h)
$$ LANGUAGE plpython3u;
62s

PL / V8 :

DO $$
     var h=plv8.prepare('SELECT count(*) FROM pg_class');
     for(var i=0;i<1000;i++) h.execute();
$$ LANGUAGE plv8;
53ms

With prepareour two languages, we almost caught up with PL / pgSQL, while the third one also wanted to, but did not reach the finish line due to the growing demand for memory.

But if you do not take into account memory, then it is clear that all languages ​​go almost head to head - and not by chance. Their bottleneck is now common - working with the database through SPI.

Computing performance


We see that the language performance has rested in working with the database. To compare languages ​​with each other, let's try to calculate something without resorting to the database, for example, the sum of the squares.

PL / pgSQL :

DO $$
     DECLARE i bigint; a bigint;
     BEGIN a=0;
     FOR i IN 0..1000000 LOOP
          a=a+i*i::bigint;
     END LOOP;
END;
$$ LANGUAGE plpgsql;
280ms

PL / Perl :

DO $$
     my $a=0;
     for my $i (0..1000000) { $a+=$i*$i; };
     warn $a;
$$ LANGUAGE plperl;
63ms

PL / Python 3 :

DO $$
a=0
for i in range(1,1000001): a=a+i*i
$$ LANGUAGE plpython3u;
73ms

PL / V8 :

DO $$
     var a=0;
     for(var i=0;i<=1000000;i++) a+=i*i;
     plv8.elog(NOTICE, a);
$$ language plv8;
7.5ms

We see that PL / Perl and PL / Python overtook and overtook PL / pgSQL, they are 4 times faster. And the eight is tearing everyone! But is it really for nothing? Or will we get it for the head? Yes, we will.

The number in JavaScript is a float, and the result is fast, but not accurate: 333333833333127550 instead of 333333833333500000.

Here is the formula by which the exact result is calculated :

∑ = n*(n+1)*(2n+1)/6

As an exercise, you can prove it using mathematical induction.

In the order of laughter

DO LANGUAGE plv8 $$
plv8.elog(NOTICE, parseInt(33333383333312755033)) $$;

NOTICE:
33333383333312754000

In Javascript, parseIntit still does a float, not an Int.

Nevertheless, BigInt appeared in V8 in 2018 , and it can now be counted for sure, but with a detriment to speed, since it is not a 64-bit integer, but an integer of arbitrary bit depth. However, in PL / V8 this innovation has not yet hit. In other procedural languages, arbitrary bit numbers (analogues of SQL numeric) are supported through special libraries.

In Perl, there is a Math :: BigFloat module for arithmetic with arbitrary precision, and in Python, the Bigfloat package is a Cython wrapper around the GNU MPFR library .

Performance functions for sorting


Here is a practical example, which shows the difference in the performance of sorting by function, if this function is written in different languages. Task: to sort the text fields containing the numbers of the issues of the journal, which may be as follows:

1
2
3
4-5
6
6A
6
11
12

Those. it's actually a string, but it starts with a number, and you need to sort by these numbers. Therefore, in order to correctly sort as strings, we supplement the numerical part with zeros on the left to get:

0000000001
0000000002
0000000003
0000000004-5
0000000006
0000000006A
0000000006
0000000011
0000000012

Yes, I know that this is not the only solution to the problem (and not even quite right). But for example, it will do.

To request a type, SELECT ... ORDER BY nsort(n)we write functions in PL / Perl, SQL, PL / Python, and PL / V8 that convert the journal numbers to this form:

CREATE OR REPLACE FUNCTION nsort(text) RETURNS text 
   LANGUAGE PLPERL IMMUTABLE AS $$
    my $x = shift;
    return ($x =~ /^\s*(\d+)(.*)$/)
        ? sprintf("%010d", $1).$2
        : $x;
$$;

CREATE OR REPLACE FUNCTION _nsort(x text) RETURNS text
     LANGUAGE SQL  IMMUTABLE  AS $$
 WITH y AS (
    SELECT regexp_match(x,'^\s*(\d*)(.*)$') as z
 )
 SELECT CASE WHEN z[1] = '' THEN x ELSE lpad(z[1],10,'0') || z[2] END FROM y;
$$;

CREATE OR REPLACE FUNCTION py_nsort(x text) RETURNS text 
   LANGUAGE plpython2u IMMUTABLE AS $$
import re
r = re.match('^\s*(\d+)(.*)$', x)
return x if r == None else ('%010d' % int(r.group(1))) + r.group(2)
$$;

CREATE OR REPLACE FUNCTION js_nsort(x text) RETURNS text 
   LANGUAGE plv8 IMMUTABLE AS $$
var m = x.match(/^\s*(\d+)(.*)$/);
if(m) { return m[1].padStart(10-m[1].length,'0') + m[2]; }
else { return x; } 
$$;

In my library of 15.5 thousand journal articles, a query using a function in PL / Perl takes about 64ms against 120ms in PL / Python and 200ms in PL / PgSQL. But the fastest - PL / v8: 54ms.

Note: when experimenting with sorting, provide the necessary amount of working memory so that sorting goes in memory (EXPLAIN will then show Sort Method: quicksort). The amount of memory is set by the parameter work_mem:

set work_mem = '20MB';

Memory


Perl does not like looped structures; he does not know how to clean them. If you ahave a pointer to b, and a bpointer to a, then the reference counter will never be reset and memory will not be freed.

Garbage collection languages ​​have other problems. It is not known, for example, when memory will be freed, or whether it will be freed at all. Or - if you don’t take care of this on purpose - collectors will go to collect garbage at the most inopportune moment.

But there are also memory management features directly related to Postgres. There are structures that SPI allocates, and Perl does not always realize that they need to be freed.

PL / Perl

This is NOT how it goes:

CREATE OR REPLACE function cr()
RETURNS int LANGUAGE plperl AS
$$
     return spi_exec_query(
           'SELECT count(*) FROM pg_class'
     )->{rows}->[0]->{count};
$$;

And so it goes:

CREATE OR REPLACE function cr()
RETURNS int LANGUAGE plperl AS
$$
     my $h = spi_prepare(
          'SELECT count(*) FROM pg_class'
     );
     return spi_exec_prepared($h)->{rows}->[0]->{count};
$$;

After execution, the handler $hwill remain alive, despite the fact that not a single living link to him remains.

It's okay, you just need to remember the need to explicitly release resources with spi_freeplan($h):

CREATE OR REPLACE function cr()
RETURNS int LANGUAGE plperl AS
$$
     my $h = spi_prepare(
          'select count(*) from pg_class'
     );
     my $res = spi_exec_prepared($h)->{rows}->[0]->{count};
     spi_freeplan($h);
     return $res;
$$;

PL / Python:

Python never flows , the plan is automatically released:

CREATE OR REPLACE function cr3() RETURNS int
LANGUAGE plpythonu as
$$
     return plpy.execute(
           'select count(*) from pg_class'
     )[0]['count']
$$;

PL / V8

Same story as Perl. It does not flow like this:

CREATE OR REPLACE FUNCTION crq() RETURNS int
LANGUAGE plv8 AS
$$
     return plv8.execute(
          'select count(*) from pg_class‘
     )[0].count;
$$;

And so it goes:

CREATE OR REPLACE FUNCTION crq() RETURNS int
LANGUAGE plv8 AS
$$
     var h = plv8.prepare(
          'select count(*) from pg_class'
     );
     return h.execute()[0].count;
$$;

Again: don't forget about freeing up resources. Here it does. h.free();

It doesn’t flow:

CREATE OR REPLACE FUNCTION crq() RETURNS int
LANGUAGE plv8 AS
$$
     var h = plv8.prepare(
          'select count(*) from pg_class'
     );
     var r = h.execute()[0].count;
     h.free();
     return r;
$$;

Parameters


It's time to understand how arguments are passed to functions. In the examples, we will pass 4 parameters with types to the function:

  • whole;
  • an array;
  • bytea and
  • jsonb

How do they get into PL / Perl ?

CREATE OR REPLACE FUNCTION crq(a int, b
bytea, c int[], d jsonb ) RETURNS void
LANGUAGE plperl AS
$$
    warn Dumper(@_);
$$;

SELECT crq(1,'abcd', ARRAY[1,2,3],'{"a":2,"b":3}');


WARNING:  $VAR1 = '1';
$VAR2 = '\\x61626364';
$VAR3 = bless( {
                 'array' => [
                              '1',
                              '2',
                              '3'
                            ],
                 'typeoid' => 1007
               }, 'PostgreSQL::InServer::ARRAY' );
$VAR4 = '{"a": 2, "b": 3}';
 crq 
-----
(1 row)

Will it be JSON or JSONB - in this case, it makes no difference: they still get in the form of a string. This is a fee for versatility: Postgres has a lot of types, of varying degrees of “embeddedness”. To demand from the developer that with the new type he immediately provides and conversion functions for all PL / * would be too much. By default, many types are passed as strings. But this is not always convenient, you have to parse these terms. Of course, I would like Postgres data to immediately turn into the appropriate Perl structures. By default, this does not happen, but starting from 9.6, the TRANSFORM mechanism appeared - the ability to define type conversion functions: CREATE TRANSFORM .

To create TRANSFORM, you need to write two functions in C: one will convert data of a certain type to one side, the other back. Please note, TRANSFORM works in four places:

  • When passing parameters to a function;
  • When returning a function value;
  • When passing parameters to an SPI call inside a function;
  • Upon receipt of the result of the SPI call inside the function.

TRANSFORM JSONB for Perl and Python, developed by Anton Bykov, appeared in the 11th version of Postgres. Now you do not need to parse JSONB, it gets into Perl right away as the corresponding structure. You must create the jsonb_plperl extension, and then you can use TRANSFORM:

CREATE EXTENSION jsonb_plperl;
CREATE OR REPLACE FUNCTION crq2(d jsonb)
RETURNS void LANGUAGE plperl
TRANSFORM FOR TYPE jsonb AS $$
     warn Dumper(@_);
$$;

You can call this function to verify that JSONB has turned into a pearl hash:

SELECT crq2( '{"a":2,"b":3}');


WARNING:  $VAR1 = {
          'a' => '2',
          'b' => '3'
        };
 crq2 
------
(1 row)

A completely different matter!

The author of this article also had a hand in developing TRANSFORMs. It turned out that such a simple data type, as booleanpassed to PL / Perl in an inconvenient form, as text strings 't'or 'f'. But in Perl's understanding, the string 'f' is true. To eliminate the inconvenience, a patch was invented that defined the conversion for the Boolean type . This patch hit PostgreSQL 13 and will be available soon. Due to its simplicity, bool_plperl can serve as a minimal starting model for writing any other conversion.

I hope someone will develop TRANSFORM for other data types (bytea, arrays, dates, numeric).

Now let's see how the parameters are passed in Python .

CREATE EXTENSION jsonb_plpython3u;
CREATE OR REPLACE FUNCTION pdump(a int, b bytea, c int[], d jsonb ) RETURNS void
LANGUAGE plpython3u
TRANSFORM FOR TYPE jsonb AS $$
      plpy.warning(a,b,c,d)
$$;

SELECT pdump(1,'abcd', ARRAY[1,2,3],'{"a":2,"b":3}');


WARNING:  (1, b'abcd', [1, 2, 3], {'a': Decimal('2'), 'b': Decimal('3')})
 pdump 
-------
(1 row)

An array is converted to an array - this is good (since version PG10 multidimensional arrays are also correctly transferred to python). In Perl, an array was converted to an object of a special class. Well, jsonbtransformed. Without TRANSFORM, jsonb will be passed as a string.

Now let's see in what form the parameters get into JS .

CREATE OR REPLACE FUNCTION jsdump(a int, b bytea, c int[], d jsonb) RETURNS void
LANGUAGE plv8 AS $$
     plv8.elog(WARNING,a,b,c,d)
$$;

SELECT jsdump(1,'abcd', ARRAY[1,2,3],'{"a":2,"b":3}');


WARNING:  1 97,98,99,100 1,2,3 [object Object]
jsdump 
-------
(1 row)

JSONB converted to a JavaScript object without any TRANSFORM! Temporary Postgres types are also converted to the Date JS type. Same thing with boolean. All transformations are already built into PL / V8.

Work with infinity


The INFINITY constant is not used very often, but sloppy work with it is dangerous. In PostgreSQL, Infinity and -Infinity exist as special values ​​for some temporary and floating-point types. But the transfer of Infinity to procedural languages ​​and vice versa needs to be discussed in detail, since working with them can depend not only on the language, but also on libraries, on the OS and even on the hardware.

Python has a Numpy module that defines numerical infinity:

import numpy as nm
a = nm.inf
b = -nm.inf
print(a, b)

inf -inf

Perl also has infinity, it uses a string "infinity"that can be shortened to "inf". For example, you could say:

perl -e 'print 1 * "inf"'

Inf

or

perl -e 'print 1/"inf"'

0

In PL / Perl, PL / Python, PL / v8, numerical infinity from Postgres is passed correctly, but an infinite date is not quite right. Rather, in PL / Perl and PL / Python there is no built-in data type for time, a string comes there. In PL / V8, there is a built-in type Date, and the usual date from a postgres turns into it. But the V8 does not know the endless date, and when transferred, it turns into Invalid Date.

Passing parameters to prepared requests


Back to prepare, consider how the parameters are passed there. Different languages ​​have a lot in common, since they are all based on SPI.

When you prepare a query in PL / Perl , you need to determine the type of parameters that are passed, and when you run the query, you only specify the values ​​of these parameters (the parameters are passed to PL / pgSQL in the same way).

DO LANGUAGE plperl $$
     my $h= spi_prepare('SELECT * FROM pg_class WHERE
          relname ~ $1', 'text' );                     #   
     warn Dumper(spi_exec_prepared($h, 'pg_language')); #   
     spi_freeplan($h);
$$;

In PL / Python, the essence is the same, but the syntax is slightly different:

DO LANGUAGE plpython3u $$
     h= plpy.prepare('SELECT relname FROM pg_class WHERE relname ~ $1', ['text'] )
     plpy.notice(.execute (['pg_language']))
$$;

In PL / V8, the differences are minimal:

DO LANGUAGE plv8 $$
    var h= plv8.prepare('SELECT relname FROM pg_class WHERE relname ~ $1', ['text'] );
    plv8.elog(NOTICE, h.execute (['pg_language']));
    h.free();
$$;

In PL / Java, everything is different. There, SPI is clearly not used, but a pseudo-JDBC connection to the database is formed. For a PL / Java programmer, everything happens as if he were creating a client application. This is convenient, and one could also approach the design of PL / Perl and PL / Python, but for some reason this was not done (however, no one forbids creating a couple more implementations of PL / Perl and PL / Python).

Work with cursor


All the SPI functions that we used when we went to the database — spi_exec_query()and others — have a parameter that limits the number of rows returned. If you need a lot of returned rows, then you can’t do without a cursor to pull them up a little.

Cursors work in all of these languages. In PL / Perl,
spi_exec_query returns a cursor from which you can extract strings one at a time. It is not necessary to close the cursor; it will close itself. But if you want to rediscover it again, you can explicitly close it with a command close().

DO LANGUAGE plperl $$
    my $cursor = spi_query('SELECT * FROM pg_class');
    my $row;
    while(defined($row = spi_fetchrow($cursor))) {
         warn $row->{relname};
    }
$$;

WARNING:  pg_statistic at line 5.
WARNING:  pg_toast_2604 at line 5.
WARNING:  pg_toast_2604_index at line 5.
WARNING:  pg_toast_2606 at line 5.
WARNING:  pg_toast_2606_index at line 5.
WARNING:  pg_toast_2609 at line 5.
WARNING:  pg_toast_2609_index at line 5.
...

In PL / Python, everything is very similar, but the cursor is presented as an object that you can cycle through:

h = plpy.prepare('SELECT ...');
cursor = plpy.cursor(h);
for row in cursor:
...
cursor.close() //  

In PL / v8, everything is also very similar, but do not forget to free the prepared query plan:

var h = plv.prepare('SELECT ...');
var cursor = h.cursor();
var row;
while(row = cursor.fetch()) {
...
}
cursor.close();
h.free();

PL / V8: Quick access to features


In PL / V8, you can call a function not from a regular SELECT, but find it by name and immediately launch it with plv8.find_function(name);. But keep in mind that in JS a function cannot be polymorphic, as in PostgreSQL, in which functions with the same name but with different parameters can coexist. In PL / v8, of course, we can create polymorphic functions, but there find_functionwill be an error when trying to use it.

ERROR:  Error: more than one function named "jsdump"

If a function by name is unambiguous, then it can be called without SPI and type conversions, i.e. much faster. For example, like this:

DO LANGUAGE plv8 $$
plv8.find_function('jsdump')(1, 'abc');
$$;

Transactions


Postgres 11 has a lot of fun: real procedures have appeared . Postgres used to have only features. The joy is not only due to compatibility and compliance with the SQL standard, but why: within the procedures you can commit and roll back transactions.

PL / Perl and PL / Python already have SPI functions for managing transactions, while PL / V8 does not yet. In PL / Perl, these functions are called spi_commit()and spi_rollback(), and a usage example is in the documentation . In PL / Python, this is plpy.commit()and plpy.rollback().

Subtransaction


Subtransactions are convenient for correct error handling in complex multi-level logic.

In PL / pgSQL inside a transaction, each block with the EXCEPTION keyword is a subtransaction. You can read about some performance and reliability issues that may arise in this case, for example, here .

There are no explicit subtransactions in PL / Perl , but they can be simulated through savaepoints. Apparently, if you wish, it is easy to write a pearl module that implements subtransactions in an explicit form.

In PL / Python, sub-transactions appeared long ago: from 9.5 explicit , before that there were implicit ones . You can define a transaction, wrap it intry-and execute. If the subtransaction falls off, then we fall into the block except, if not fall off, then into the block elseand move on.

try:
     with plpy.subtransaction():
          plpy.execute("...")
          plpy.execute("...")
except plpy.SPIError, e:
. . .
else:
. . .

A similar design exists in PL / V8 , only in JS syntax.

try {
plv8.subtransaction(function() {
plv8.execute('UPDATE...');
plv8.execute('UPDATE...');
});
}
catch(e) {
...
}

Conclusion


Try, but do not abuse :) Knowledge of PL / * can bring some benefits. Like any tool, they love to be used for their intended purpose.

PL / v8 is very promising, but sometimes it behaves unexpectedly and has a number of problems. Therefore, it is better to take languages ​​out of the box if they are suitable for your task.

I want to thank Igor Levshin (Igor_Le), who helped me a lot with preparing the material for the article, and threw some useful ideas, as well as Evgeny Sergeev and Alexey Fadeev for the corrections they proposed.

All Articles