How PostgreSQL works with disk. Ilya Kosmodemyansky

Decoding of the report of 2014 by Ilya Kosmodemyansky "How PostgreSQL works with the disk."


Part of the post, of course, is outdated, but here are the fundamental aspects of PostgreSQL when working with the disk, which are still relevant.


Disks, memory, price, processor - in this order, admins who buy a machine for a database look at server characteristics. How are these characteristics interconnected? Why exactly them?


The report will explain why a database disk is needed in general, how PostgreSQL interacts with it, and what are the features of PostgreSQL compared to other databases.


Hardware, settings of the operating system, file system and PostgreSQL: how and why to choose a good setup, what to do if the hardware configuration is not optimal, and what errors can make the most expensive RAID controller useless. A fascinating journey into the world of batteries, dirty and clean pages, good and bad SSDs, reddened monitoring schedules and nightmares for system administrators.



My name is Ilya Kosmodemyansky. I work for PostgreSQL-Consulting. I’m doing a variety of things related to Postgres, its performance and the like.


, Postgres, . - Write Ahead log, IO PostgreSQL, . , , -, PostgreSQL.



:


  • ?


  • PostgreSQL?


  • , .


  • . ? , ?


  • PostgreSQL.


  • . PostgreSQL , .




  • -, . , , , . , .
  • , Write Ahead Log, . Write Ahead Log, . Write Ahead Log .
  • – checkpoint, WAL , .


PostgreSQL :


  • PostgreSQL autovacuum. , . PostgreSQL demon autovacuum. – . , .


  • PostgreSQL pg_clog. . PostgreSQL pg_clog. OLTP, . , , RAM-. RAM-. RAM- , , PostgreSQL . . , .


  • tmp, , – . , , , . ., . - work_mem ( , PostgreSQL) , . , , explain analyze – , . .




, – checkpoints, , . «pik» — . .


? Oracle, Log Writer DBwriter, , , PostgreSQL fsync. .


Fsync – UNIX. fsync , - . . , checkpoints, .



? PostgreSQL shared_buffers, , . . . , , . : update, insert tuple , .



  • checkpoint ? , , WAL, .


  • , . WAL COMMIT , .


  • WAL , , checkpoint. . Checkpoint , WAL fsync .


  • shared_buffers , , shared_buffers, . - .




, . , , , . . iostat.


? . , , , shared buffers. . - , .


checkpoint, pdflush . IO . , 100 % . , fsync , . , , . , select , .


pg_stat_bgwriter, , checkpoint , checkpoints . . .



pg_stat_bgwriter. , , . , . - - . , , . , , - .


, . , - .


, , . , , , PostgreSQL. . - , , . , - , , . . . .



, ?


hardware. - , - .


  • -, RAID-. RAID-? RAID- – , , . , , 100 / , , , . CPU.


    CPU, , software RAID , , .


  • RAID- . . . . . , .


    ? . , fsync , . fsync , . , . , , , , , fsync checkpoints . , RAID, .


  • RAID , . . . - . - RAID , , . megaraid perc, - , . , . . .


    , HP RAID-, . , , , , . , . . , , . .


  • RAID . . . write back cache. io mode direct. – , PostgreSQL fsyng .


  • Disk Write Cache Mode. , . SSD , , , , , , .




  • , , SAS. . seek , . . , .


  • SSD. SSD , . , SSD . SSD, SATA , , . , .


  • SSD only , PostgreSQL . , SSD , Write Ahead Log, temp, , SSD, WAL – SSD. SSD. . , . , . .


  • RAID, , , , 10, - 5- 6-. , -, RAID . , -, , . Stripe , . stripe , .


  • , . , , , - , . RAID – . SATA - , off, , , .


    , , , , , – . , , . . , .




. ?


-, c noatime , , . , . barrier, xfs ext4 .


? , , , inodes, inodes . , , . Linux scall, . , inodes, .


128 GB shared buffers, RAID-. . . , Debian , , .


, , zfs, , partition , partition . , , . . . . production xfs ext4 Linux, .



, . . . , . .


- , : vm.dirty_ratio=20 vm.dirty_background_ratio=10. .


? Pdflush, , , 10 20 % . . , 128 GB , 20 % . RAID – 1-2 GB, . , 512 MB. , . , , pdflush , .


( Linux PostgreSQL : vm.dirty_background_ratio 5% ~ 25% .)


vm.dirty_bytes. . - , . , . , .


RAID, , , , - . .



postgresql.conf, checkpoints , .


  • Wal_buffers – , PostgreSQL. - , , checkpoint. , , , - -.


  • , flush , checkpoint segment - - . 256. - 1000 checkpoints, , . , checkpoint, , 48 MB , 4 GB. RAID-.


  • checkpoints -, - . checkpoint_timeout . , -, . checkpoints . pg_stat_bgwriter 0, , checkpoints.



. . Checkpoints -, - . , - , , . checkpoint , , -.


  • , checkpoint_completion_target, 0,7-0,9. checkpoint, . . . , checkpoints, checkpoint, - .


    0,1, , 10 % , checkpoints, . - , .




, ?


PostgreSQL , pg_test_fsync. , , hardware iops, checkpoints PostgreSQL, - .


( ), , . , - . , . , -, . . .


, , , .


. , . . . hardware , , .



hack, . , .


checkpoint PostgreSQL bgwriter, . . . .


? checkpoint . . Bgwriter , , . . , .


checkpoint . bgwriter . checkpoint.


, . , , . . 10 000, 1 000 10. . , checkpoint bgwriter. checkpoints update, insert, , . , .



? PostgreSQL – autovacuum. autovacuum . , autovacuum , , . , . .


, . : 40 MB select 30 - . PostgreSQL, .


. , autovacuum, .


Autovacuum . ? : autovacuum_vacuum_scale_factor autovacuum_analyze_scale_factor. 20 %. 0,02? , . 20 % . , PostgreSQL updates, insert tuple delete. Delete – delete, . tuple , autovacuum .


20 % . , , . . overhead , . , , , , ddl, autovacuum. .


- . , 0,001-0,01. . . 0,001 , autovacuum . (: autovacuum_vacuum_scale_factor autovacuum_analyze_scale_factor 5% — https://habr.com/ru/post/501516/) autovacuum , . . . , autovacuum , . overhead - . ., autovacuums .


? autovacuum. , . . 98-100 % , , , , . , 0,1 % autovacuum. , , . . , , . , autovacuum, , 20-30-50 %. autovacuum, , . , – 10-20, , work_mem , autovacuums .


autovacuum_analyze_scale_factor. , analyze. , , 32- . , 50 % , . , 0,02-0,03-0,04. , demon , , , .



, autovacuum ?


, , delete, tuple . xmax, . . transaction id, tuple .


* , autovacuum ?


, . , – . , , pg_catalog, . , pg_catalog’ . . .. . , , . , , .


! - PostgreSQL ? , KVM VirtIO ?


PostgreSQL , - - latency . , , , checkpoint . IO .


, , , , . , , - . , . - LibvitIO. - . . , .


, . . . , fsync. fsync datasync?


. pg_test_fsync , fsync , . postgresql.conf , . .


. . , SSD, HDD write cache. - ? write cache writes .


. , , , . - , , , .


, , , , RAID . , . - storages. . , - , .


. . - pgtune , . . RAIDs . ., - ? . . , RAID - , .


. . , , . , shared buffers - , work_mem , . . , . PostgreSQL , , , , , , linux- . Linux PostgreSQL . , PostgreSQL wizard – , . -: , . , , .


PostgreSQL BSD? ?


. FreeBSD , , Linux . – huge pages. Huge pages BSD . PostgreSQL, 9.4 , Linux. . . , , overhead . , workload. sync . , 3.3 , . PostgreSQL MySQL. , .


, BSD, , , . , - Oracle IBM . . . FreeBSD . , BSD .


, , ? read only , ?


inserts , . , , , . , , insert, , . , , , - time base inventory, timestamp’. , , , , -, .


, , - , - , - .


read only , , SSD, SAS’, SSD tablespace. , , tablespace SSD, , SSD. random page cost . . selects. .


?


The story is more detailed. For example, you have a partitioned table. And you keep your head hot for 3-4 weeks, that is, how much you really need to go for this analytics. All that further, for example, you collapse into one large partition and try not to go to it once again, so as not to interfere with I / O. Or you dump this last table and take it to another machine that does not have online OLTP loads for this aggregation. And if you need to get this data, you go there and drive these requests there, so as not to spoil the picture at the main base.


All Articles