PostgreSQL: Development of extensions (functions) in C language

This article was written a couple of years ago, and did not know where it could be put, and then forgot.

The meaning of using the C language when developing extensions for PostgreSQL compared to interpreted (scripted) languages ​​can be reduced to two points: performance and functionality. But simply, the code written in C will work much faster, for example, if the function is called a million times in the request for each record. More specifically, some PostgreSQL features cannot be done at all except in C, for example, in other languages, types (especially if you return a value from a function) ANYELEMENT , ANYARRAY and especially important VARIADIC are not supported .

Simple C function


For example, we write a function that takes two arguments and adds them. This example is described in the documentation for PostgreSQL, but we will slightly improve it and collect it. Then load into PostgreSQL and write a call from the SQL function.

#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(add_ab);
Datum add_ab(PG_FUNCTION_ARGS)
{
    int32 arg_a = PG_GETARG_INT32(0);
    int32 arg_b = PG_GETARG_INT32(1);
    PG_RETURN_INT32(arg_a + arg_b);
}

The add_func.c file can be used as a template for developing more complex functionality. Also, go through the code

  • #include "postgresql.h": this header file you will always have to include, contains various basic types and functions.
  • #include "fmgr.h": the header file contains various PG_ * macros.
  • PG_MODULE_MAGIC: a macro that determines that we are developing a module for PostgreSQL above version 8.2.
  • PG_FUNC_INFO_V1: A macro defining function calling conventions within PostgreSQL. If you declare a function without it, then there will be an agreement on calling Version 0, otherwise Version 1.
  • Datum: . , PostgreSQL. - VARIANT Microsoft. «» - . , .
  • add_ab(PG_FUNCTION_ARGS): . - . , .
  • int32 arg_a = PG_GETARG_INT32(0): ( ).
  • PG_RETURN_INT32(arg_a + arg_b): .

Now we need to compile and compile this correctly. The output will be a dynamically loadable shared library (* .so). For this, it will be more convenient to do it through the Makefile . The documentation describes the keys and paths that need to be registered, but we will collect using PGXS . This is an environment for extension developers, which means that all necessary -dev and -devel packages for PostgreSQL must be installed on your system .

MODULES = add_func
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

We create SQL function


Now we’ll write an SQL function that will be called from our previously created extension library.
CREATE FUNCTION add(int, int)
  RETURNS int
AS '/usr/lib/postgresql/9.4/lib/add_func', 'add_ab'
LANGUAGE C STRICT;

That's all! Now we can use this function so

SELECT add(1, 2);

Automate the installation


Now we automate the installation a bit. This will be very useful when you do not know in advance which version of PostgreSQL is used, and in which way it is installed. To do this, create the following file,

CREATE FUNCTION add(int, int)
  RETURNS int
AS 'MODULE_PATHNAME', 'add_ab'
LANGUAGE C STRICT;

and add a line to the Makefile ,

MODULES = add_func
DATA_built = add_funcs.sql
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Improve a little


We’ll make some changes to our example and make some improvements.

Checking Arguments


Remember in the SQL definition of the add (int, int) function we used the STRICT keyword ? It means that if at least one of the arguments is NULL , the function will not work, and will simply return NULL . This is similar to the behavior of SQL statements, for example, if the "+" statement has at least one NULL argument , then the result will be NULL .

We add an argument check to our function, similar to the SQL aggregation function sum () , which ignores NULL values and continues to work. For this we need to do

  • Ensure that the function works even if one of the arguments is NULL
  • If both arguments are NULL , then return NULL

PG_FUNCTION_INFO_V1(add_ab_null);
Datum add_ab_null(PG_FUNCTION_ARGS)
{
    int32 not_null = 0;
    int32 sum = 0;
    if (!PG_ARGISNULL(0))
    {
        sum += PG_GETARG_INT32(0);
        not_null = 1;
    }
    if (!PG_ARGISNULL(1))
    {
        sum += PG_GETARG_INT32(1);
        not_null = 1;
    }
    if (not_null)
    {
        PG_RETURN_INT32(sum);
    }
    PG_RETURN_NULL();
}

And now check it out,

CREATE FUNCTION add(int, int)
 RETURNS int
AS '$libdir/add_func', 'add_ab_null'
LANGUAGE C;

SELECT add(NULL, NULL) AS must_be_null, add(NULL, 1) AS must_be_one;
-[ RECORD 1 ]+--
must_be_null |
must_be_one  | 1

And here is how the same can be achieved with standard PostgreSQL tools,
SELECT (CASE WHEN (a IS null) AND (b IS null)
(THEN null
ELSE coalesce(a, 0) + coalesce(b,0)
END)
FROM (SELECT 1::int AS a, null::int AS b) s;
-[ RECORD 1 ]
 case | 1

Any number of arguments in a function


As you already noticed, we used macros to get the value of the arguments. Therefore, we can pass any number of arguments, and then just read their values ​​in a loop,

if (!PG_ARGISNULL(i))
{
    sum += PG_GETARG_INT32(i);
    not_null = 1;
}

You may immediately have a question, but you can use an array to pass the value of the arguments. And in fact, this is what needs to be done, but unfortunately due to the presence of its own memory manager in PostgreSQL, this is not such a trivial task. But try to solve it. There is an example in the PostgreSQL documentation where the text [] character array is passed, but this is not exactly what we need. Let's try to adapt for our purposes,

#include "utils/array.h"     //       
#include "catalog/pg_type.h" // INT4OID
PG_MODULE_MAGIC;
Datum add_int32_array(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(add_int32_array);
Datum add_int32_array(PG_FUNCTION_ARGS)
{
    //     .
    //          ,       int.
    ArrayType *input_array;
    int32 sum = 0;
    bool not_null = false;

    Datum *datums;
    bool *nulls;
    int count;
    int i;
    input_array = PG_GETARG_ARRAYTYPE_P(0); //     .  *_P  
                                            //        ,   INT32

    //          INT32 (INT4)
    Assert(ARR_ELEMTYPE(input_array) == INT4OID);

    //      
    if (ARR_NDIM(input_array) > 1)
        ereport(ERROR, (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), errmsg("1-dimensional array needed")));

    deconstruct_array(input_array, //  
                      INT4OID,     //  
                      4,           //    
                      true,        // int4   
                      'i',         //  'i'
                      &datums, &nulls, &count); //   

    for(i = 0; i < count; i++)
    {
        //     
        if (nulls[i])
            continue;

        // ,       
        sum += DatumGetInt32(datums[i]);
        not_null = true;
    }
    if (not_null)
        PG_RETURN_INT32(sum);

    PG_RETURN_NULL();
}

As always, let's go through the code,

  • There is no special type for integer arrays, so we use the generic type ArrayType , which works for any type of array
  • To initialize the array with the first argument, we used the special macro PG_GETARG_ARRAYTYPE_P
  • There is also a check to see if the array is really one-dimensional ARR_NDIM
  • The OID type for int4 (= 23) is defined as INT4OID. To view definitions for other types, you can use SQL,

    select oid, typlen, typbyval, typalign from pg_type
    where typname = 'int4';
    -[ RECORD 1 ]
    oid | 23
    typlen | 4
    typbyval | t
    typalign | i
    


Now we just have to teach PostgreSQL to use this by declaring a function that takes an int [] argument,

CREATE OR REPLACE FUNCTION add_arr(int[]) RETURNS int
AS '$libdir/add_func', 'add_int32_array'
LANGUAGE C STRICT;

And check

SELECT add_arr('{1,2,3,4,5,6,7,8,9}');
-[ RECORD 1 ]
add_arr | 45
SELECT add_arr(ARRAY[1,2,NULL]);
-[ RECORD 1 ]
add_arr | 3
SELECT add_arr(ARRAY[NULL::int]);
-[ RECORD 1 ]
add_arr |

Recommendations


Now let's summarize some of the main recommendations for creating extensions.

Work with memory


The authors and creators of the PostgreSQL DBMS pay particular attention to working with memory and preventing leaks. This is one of the basic rules for developing a DBMS and its extensions. This is based on their own decision - the context of memory.

Using palloc () and pfree ()


All work with memory in PostgreSQL consists in using non-standard functions palloc () and pfree ().

Initialization of Structures


Always initialize new declared structures. For example, call memset () after palloc ().

Connected files


Each of your extensions must include at least two files: postgres.h and fmgr.h.

useful links


PostgreSQL: Server Programming Interface .

PostgreSQL: User Defined Functions in C

PostgreSQL Extension Network

All Articles