👦 👨‍🔬 🔷 Security and DBMS: what you need to remember when choosing protection tools 👩🏾‍🎓 📆 🧖🏾

My name is Denis Rozhkov, I’m the head of software development at Gazinformservice, in the Jatoba product team . Legislation and corporate regulations impose certain requirements for the security of data storage. Nobody wants third parties to gain access to confidential information, therefore the following issues are important for any project: identification and authentication, management of data access, ensuring the integrity of information in the system, recording security events. Therefore, I want to talk about some interesting points regarding the DBMS security.

This article was written by @Databases Meetup, hosted by Mail.ru Cloud Solutions . If you do not want to read, you can see:

The article will have three parts:

How to protect connections.
What is an audit of actions and how to record what is happening on the part of the database and connecting to it.
How to protect data in the database itself and what technologies are there for this.

The three components of DBMS security: connection protection, activity audit, and data protection

Connection Protection

You can connect to the database either directly or indirectly through web applications. As a rule, the user from the side of the business, that is, the person who works with the DBMS, does not interact directly with it.

Before talking about securing connections, you need to answer important questions that depend on how security measures will be built:

whether one business user is equivalent to one DBMS user;
Is access to DBMS data provided only through the API that you control, or is there access to tables directly;
whether the DBMS is allocated in a separate protected segment, who interacts with it and how;
whether pooling / proxy and middleware are used, which can change information about how the connection is built and who uses the database.

Now let's see what tools can be used to protect connections:

Use database firewall class solutions. An additional layer of protection, at least, will increase the transparency of what is happening in the DBMS, as a maximum - you can provide additional data protection.
. , . — -, , . , , .

, MS SQL Vulnerability Assessmen .
. , , , , , . .
Configure SSL, if you do not have a network separation of the DBMS from end users, it is not in a separate VLAN. In such cases, it is necessary to protect the channel between the consumer and the DBMS itself. Protection tools are among open source.

How will this affect DBMS performance?

Let's look at a PostgreSQL example, how SSL affects the CPU load, increasing timings and decreasing TPS, if too many resources go if you enable it.

We load PostgreSQL using pgbench - it is a simple program for running performance tests. It repeatedly executes one sequence of commands, possibly in parallel database sessions, and then calculates the average transaction speed.

Test 1 without SSL and using SSL - a connection is established with each transaction:

pgbench.exe --connect -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres sslmode=require 
sslrootcert=rootCA.crt sslcert=client.crt sslkey=client.key"

pgbench.exe --connect -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres"

Test 2 without SSL and using SSL - all transactions are performed in one connection:

pgbench.exe -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres sslmode=require
sslrootcert=rootCA.crt sslcert=client.crt sslkey=client.key"

pgbench.exe -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres"

Other settings :

scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 5000
number of transactions actually processed: 50000/50000

Test Results :

	NO SSL	SSL
Connection is established at each transaction
latency average	171.915 ms	187.695 ms
tps including connections establishing	58.168112	53.278062
tps excluding connections establishing	64.084546	58.725846
CPU	24%	28%
All transactions are performed in one connection.
latency average	6.722 ms	6.342 ms
tps including connections establishing	1587.657278	1576.792883
tps excluding connections establishing	1588.380574	1577.694766
CPU	17%	21%

At light loads, the effect of SSL is comparable to the measurement error. If the amount of data transferred is very large, the situation may be different. If we establish one connection for each transaction (this is rare, usually the connection is shared between users), you have a large number of connections / disconnections, the effect may be a little more. That is, there may be risks to performance degradation, however, the difference is not so large as not to use protection.

Please note that there is a strong difference when comparing the operating modes: within the same session, you work or different. This is understandable: resources are spent on creating each connection.

We had a case when we connected Zabbix in trust mode, that is, we did not check md5, there was no need for authentication. Then the customer asked to enable md5 authentication mode. This gave a large load on the CPU, performance dipped. They began to look for ways to optimize. One of the possible solutions to the problem is to implement a network restriction, make separate VLANs for the DBMS, add settings so that it is clear who is connecting from where and remove authentication. You can also optimize authentication settings to reduce the costs of enabling authentication, but generally using different methods authentication affects performance and requires these factors to be considered when designing the computing power of servers (hardware) for a DBMS.

Conclusion: in a number of solutions, even small nuances of authentication can greatly affect the project and it’s bad when it becomes clear only when implemented in a product.

Audit of actions

An audit can be not only a DBMS. Audit is the receipt of information about what is happening on different segments. It can be both a database firewall and the operating system on which the DBMS is built.

In commercial enterprise-level DBMSs with auditing, everything is fine, in open source - not always. Here is what PostgreSQL has:

default log - built-in logging;
extensions: pgaudit - if you do not have enough default logging, you can use separate settings that solve some of the problems.

Addition to the report in the video:

“Basic registration of operators can be provided with standard logging facility with log_statement = all.

This is acceptable for monitoring and other uses, but does not provide the level of detail typically needed for an audit.

It is not enough to have a list of all operations performed with the database.

It should also be possible to find specific statements that are of interest to the auditor.

The standard logging tool shows what the user requested, while pgAudit focuses on the details of what happened when the database made the request.

For example, the auditor may want to ensure that a specific table has been created in a documented maintenance window.

This may seem like a simple task for basic auditing and grep, but what if you come across something like this (intentionally confusing) example:

DO $$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END $$;

Standard logging will give you this:

LOG: statement: DO $$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END $$;

It seems that searching for a table of interest may require some code knowledge in cases where tables are created dynamically.

This is not ideal, as it would be preferable to just search by table name.

This is where pgAudit will be useful.

For the same input, it will output this output in the log:

AUDIT: SESSION, 33.1, FUNCTION, DO ,,, “DO $$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END $$; "
AUDIT: SESSION, 33,2, DDL, CREATE TABLE, TABLE, public.important_table, CREATE TABLE important_table (id INT)

Not only the DO block is registered, but also the full text CREATE TABLE with the operator type, object type and full name, making it easier to search.

in the conduct of the operators magazine SELECT and DML pgAudit can be configured to log a separate entry for each relationship, which is referenced in the statement.

Do not need to parse to find all the statements that relate to a specific table ( * ) " .

How will this affect DBMS performance?

Let's run the tests with full audit enabled and see what happens with the performance of PostgreSQL. We enable the maximum database logging in all respects.

We almost do not change anything in the configuration file, from the important one - turn on debug5 mode to get the maximum information.

postgresql.conf

log_destination = 'stderr'
logging_collector = on
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB
log_min_messages = debug5
log_min_error_statement = debug5
log_min_duration_statement = 0
debug_print_parse =

debug_print_rewritten = on
debug_print_plan = on
debug_pretty_print = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = on
log_hostname = on
log_lock_waits = on
log_replication_commands = on
log_temp_files = 0
log_timezone =

On the PostgreSQL DBMS with the parameters 1 CPU, 2.8 GHz, 2 GB RAM, 40 GB HDD, we carry out three load tests using the following commands:

$ pgbench -p 3389 -U postgres -i -s 150 benchmark
$ pgbench -p 3389 -U postgres -c 50 -j 2 -P 60 -T 600 benchmark
$ pgbench -p 3389 -U postgres -c 150 -j 2 -P 60 -T 600 benchmark

Test results:

	Without logging	With logging
Total database filling time	43.74 sec	53.23 sec
RAM	24%	40%
CPU	72%	91%
Test 1 (50 connections)
Number of transactions in 10 minutes	74169	32445
Transaction / sec	123	54
Average delay	405 ms	925 ms
Test 2 (150 connections at 100 possible)
Number of transactions in 10 minutes	81727	31429
Transaction / sec	136	52
Average delay	550 ms	1432 ms
About sizes
DB size	2251 MB	2262 MB
Database Log Size	0 Mb	4587 Mb

Bottom line: a full audit is not very good. The data from the audit will turn out in volume, as the data in the database itself, or even more. The amount of logging that is generated when working with the DBMS is a common problem on the product.

We look at other parameters:

The speed does not change much: without logging - 43.74 seconds, with logging - 53.23 sec.
Performance on RAM and CPU will sink, since you need to create a file with the audit. It is also noticeable on the productive.

With an increase in the number of connections, of course, the performance will slightly deteriorate.

In corporations with auditing, it’s even more difficult:

there is a lot of data;
Auditing is needed not only through syslog in SIEM, but also into files: suddenly something happens with syslog, the file in which the data is saved should be close to the database;
audit requires a separate shelf, so as not to sink on I / O disks, since it takes up a lot of space;
it happens that IS employees need GOSTs everywhere, they require guest identification.

Data access restriction

Let's look at the technologies that are used to protect data and access to it in commercial DBMS and open source.

What can be used as a whole:

Encryption and obfuscation of procedures and functions (Wrapping) - that is, separate tools and utilities that make code unreadable from readable code. True, then it can neither be changed nor refactored back. Such an approach is sometimes required at least on the DBMS side — the logic of licensing restrictions or the authorization logic is encrypted precisely at the level of procedure and function.
(RLS) — , , - - .
(Masking) — , , - . , .
Security DBA/Application DBA/DBA — , , , database- application-. open source , . , .
. , , .
— .
End-to-end encryption — client-side .
. , — , .

?

Let's look at an example of column encryption in PostgreSQL. There is a pgcrypto module, it allows you to store selected fields in encrypted form. This is useful when only some data is valuable. To read the encrypted fields, the client passes the decryption key, the server decrypts the data and issues it to the client. Without a key with your data, no one can do anything.

Let's test with pgcrypto . Create a table with encrypted data and regular data. Below is the command for creating tables, in the very first line a useful command is creating the extension itself with the registration of the DBMS:

CREATE EXTENSION pgcrypto;
CREATE TABLE t1 (id integer, text1 text, text2 text);
CREATE TABLE t2 (id integer, text1 bytea, text2 bytea);
INSERT INTO t1 (id, text1, text2)
VALUES (generate_series(1,10000000), generate_series(1,10000000)::text, generate_series(1,10000000)::text);
INSERT INTO t2 (id, text1, text2) VALUES (
generate_series(1,10000000),
encrypt(cast(generate_series(1,10000000) AS text)::bytea, 'key'::bytea, 'bf'),
encrypt(cast(generate_series(1,10000000) AS text)::bytea, 'key'::bytea, 'bf'));

Next, we will try to make data sampling from each table and look at the execution timings.

Selection from the table without encryption function :

psql -c "\timing" -c "select * from t1 limit 1000;" "host=192.168.220.129 dbname=taskdb
user=postgres sslmode=disable" > 1.txt

Stopwatch is on.

id | text1 | text2
------ + ------- + -------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
...
997 | 997 | 997
998 | 998 | 998,999
| 999 | 999
1000 | 1000 | 1000
(1000 rows)

Time: 1.386 ms.

Sampling from a table with encryption function:

psql -c "\timing" -c "select id, decrypt(text1, 'key'::bytea, 'bf'),
decrypt(text2, 'key'::bytea, 'bf') from t2 limit 1000;"
"host=192.168.220.129 dbname=taskdb user=postgres sslmode=disable" > 2.txt

Stopwatch is on.

id | decrypt | decrypt
----- + -------------- + ------------
1 | \ x31 | \ x31
2 | \ x32 | \ x32
3 | \ x33 | \ x33
...
999 | \ x393939 | \ x393939
1000 | \ x31303030 | \ x31303030
(1000 lines)

Time: 50.203 ms

Test results :

	No encryption	Pgcrypto (decrypt)
1000 row fetch	1,386 ms	50,203 ms
CPU	fifteen%	35%
RAM		+ 5%

Encryption has a big impact on performance. It can be seen that the timing has increased, since the operations of decrypting encrypted data (and decryption is usually still wrapped in your logic) require significant resources. That is, the idea of encrypting all columns containing some kind of data is fraught with reduced performance.

However, encryption is not a silver bullet that solves all issues. The decrypted data and the decryption key in the process of decryption and data transfer are located on the server. Therefore, keys can be intercepted by those who have full access to the database server, for example, a system administrator.

When the whole column for all users has one key (even if not for all, but for a limited set of customers), this is not always good and right. That is why they began to do end-to-end encryption, the DBMS began to consider options for encrypting data from the client and server, the same key-vault repositories appeared - separate products that provide key management on the DBMS side.

An example of such encryption in MongoDB

Security features in commercial and open source DBMS

Functions	A type	Password policy	Audit	Protecting source code for procedures and functions	RLS	Encryption
Oracle	commercial	+	+	+	+	+
Msql	commercial	+	+	+	+	+
Jatoba	commercial	+	+	+	+	extensions
PostgreSQL	Free	extensions	extensions	-	+	extensions
Mongodb	Free	-	+	-	-	Available in MongoDB Enterprise only

The table is far from complete, but the situation is this: in commercial products, security problems have been solved for a long time, in open source, as a rule, some add-ons are used for security, many functions are not enough, sometimes you have to add something. For example, password policies - in PostgreSQL there are many different extensions ( 1 , 2 , 3 , 4 , 5 ) that implement password policies, but in my opinion, none of the needs of the domestic corporate segment are covered.

What to do if nowhere is what is needed ? For example, I want to use a specific DBMS, in which there are no functions that the customer requires.

Then you can use third-party solutions that work with different DBMSs, for example, “Crypto DB” or “Garda DB”. If we are talking about solutions from the domestic segment, then they know about GOSTs better than in open source.

The second option is to write yourself what you need, implement data access and encryption in the application at the procedure level. True, with GOST it will be more difficult. But in general - you can hide the data as needed, put it in the DBMS, then get it and decrypt it as it should, right at the application level. At the same time, immediately think about how you will protect these algorithms on application. In our opinion, this should be done at the DBMS level, because it will work faster.

This talk was first made at @Databases Meetup by Mail.ru Cloud Solutions. Watch the videoother appearances and sign up for Telegram event announcements Around Kubernetes at Mail.ru Group .

What else to read on the topic :

Security and DBMS: what you need to remember when choosing protection tools

Connection Protection

How will this affect DBMS performance?

Audit of actions

How will this affect DBMS performance?

Data access restriction

?

Security features in commercial and open source DBMS

More articles: