A lot of free RAM, NVMe Intel P4500 and everything slows down - the story of the unsuccessful addition of the swap partition

In this article, I will talk about the situation that recently occurred with one of the servers in our VPS cloud, confusing me for several hours. I’ve been configuring and troubleshooting Linux servers for about 15 years, but this case does not fit into my practice at all - I made some false assumptions and slightly despaired before I could correctly determine the cause of the problem and solve it.


Preamble


We operate a medium-sized cloud, which we build on typical servers of the following config - 32 cores, 256 GB RAM and NVMe PCI-E Intel P4500 4TB. We really like this configuration, because it allows us not to think about the lack of IO, providing the correct restriction at the level of types of instances (instances) of the VM. Because the NVMe Intel P4500 has impressive performance, we can simultaneously provide both full IOPS provision to machines and backup storage to a backup server with zero IOWAIT.


We belong to the same old believers who do not use hyper-convergent SDNs and other stylish, fashionable, youth pieces for storing VM volumes, believing that the simpler the system, the easier it is to joke under the conditions of “the main guru went to the mountains”. As a result, we store the VM volumes in QCOW2 format in XFS or EXT4, which is deployed on top of LVM2.


We are also forced to use QCOW2 by the product we use for orchestration - Apache CloudStack.

To perform a backup, we take a full volume image like an LVM2 snapshot (yes, we know that LVM2 snapshots are slow, but the Intel P4500 helps us out here too). We do lvmcreate -s ..and use it to ddsend a backup to a remote server with ZFS storage. Here we are still slightly progressive - still ZFS can store data in a compressed form, and we can quickly restore it using DDor get individual VM volumes using mount -o loop ....


, , LVM2, RO QCOW2, , , XFS , , . , - "" , - , . XFS RO , LVM2.

, 600-800 MB/s , — 10Gbit/s, .


8 . , , , -, , , 8 GB/, -.


, — Intel P4500, NFS , , ZFS.



- SWAP 8 GB, - "" DD . 2xSATA SSD RAID1 2xSAS HDD RAID1 LSI HP. , , , " readonly", SWAP-. RAM 30-40% , SWAP .


. :


#!/bin/bash

mkdir -p /mnt/backups/volumes

DIR=/mnt/images-snap
VOL=images/volume
DATE=$(date "+%d")
HOSTNAME=$(hostname)

lvcreate -s -n $VOL-snap -l100%FREE $VOL
ionice -c3 dd iflag=direct if=/dev/$VOL-snap bs=1M of=/mnt/backups/volumes/$HOSTNAME-$DATE.raw
lvremove -f $VOL-snap

ionice -c3, NVMe , IO :


cat /sys/block/nvme0n1/queue/scheduler
[none] 

, legacy- SSD RAID-, , AS IS. , , ionice .


iflag=direct DD. direct IO , IO . , oflag=direct , ZFS .


.


… , , IOWAIT 50%. :


Volume group "images" not found

" Intel P4500", , , . LVM2 LVM2:


vgcfgrestore images

:


— , , VPS- , . , — iostat IOPS- IOWAIT. , " NVMe" , .



. VPS- 128 GB RAM. , 32 GB . VPS , , SWAP- .


. vm.swappiness - 60. SWAP SAS HDD RAID1.


( ). DD , RAM NFS. , swappiness, VPS , HDD RAID1. , IOWAIT , IO NVMe, IO HDD RAID1.


. 32GB. 16 , , SWAP . swappiness 5 .


How could this not happen . Firstly, if SWAP were on an SSD RAID or NVMe device, and secondly, if there weren’t an NVMe device, but there would be a slower device that wouldn’t produce such a volume of data - ironically, the problem happened that NVMe is too fast.


After that, everything began to work as before - with zero IOWAIT.


All Articles