🌳 👩‍❤️‍💋‍👨 😀 Another look at the question "Do I need defragmentation for SSD" 👨🏾‍🤝‍👨🏼 😑 🤾🏿

Undoubtedly, the question posed in the title of the article is not new, it has been raised more than once and a consensus has been reached on it “not really needed, and can even be harmful”.
However, the recent discussion in the comments made me think again.

Over time, any SSD will still fragment strongly (internally, in FTL) ... A freshly recorded SSD with linear reading will give a high speed, but if it has already worked, it will be much lower, because it will be linear only for you.

Yes, usually this should not happen: either we write “little by little” to small files / small blocks of FS meta-information (the linear reading speed of which we don’t really care about), or we write “a lot” to large files and everything will be fine. It also happens that you add small blocks to large files - logs, for example, but they are relatively short-lived and I don’t see any particular problems here.
But it was easy to imagine a very real scenario, in which all the same internal fragmentation of the SSD can occur: a database file into which there is a fairly active random recording. Over time, it (remaining unfragmented at the operating system level) will be physically very fragmented, which can significantly reduce the speed of seq scan, backup, etc.

For verification, I wrote a script and ran tests.

Spoiler: the problem is present (significantly affects performance) on only one of the models that came to hand (and it is positioned by the manufacturer not as a datacenter, but as a desktop / laptop).

What is it all about? What else fragmentation inside the SSD?

, SSD . NAND flash ( ) . SSD 512- ( 4096-) , .
- , , FTL (flash translation layer): flash- , ( ) , , - log- .
, , , , — .

, FTL, , , , . , .
.

The idea of the script: we create a file of several gigabytes, filled with random data, measure the speed of sequential reads.
Next, using random access, we rewrite part of the test file and again measure the speed of linear reading. If our suspicions are true, then now reading from the file will go slower.
After each write, we do three read operations with a delay between them in case some drive in the background performs defragmentation and then the read speed improves.

, SSD

, , - , , . - , , , , , .
TRIM: SSD «», , FTL. NAND flash, . , , . SSD , .

, , , .

, SSD ( linux /dev/urandom).

linux c dash, coreutils fio debian buster, , freebsd «».

echo preparing...
dd if=/dev/urandom of=testfile bs=1M count=4096 status=none
sync
for A in 1 2 3; do
    sleep 10
    dd if=testfile of=/dev/null bs=1M iflag=direct
done

for A in 50 200 800 4000; do
    echo fio: write ${A}M...
    fio --name=test1 --filename=testfile --bs=4k --iodepth=1 --numjobs=1  --rw=randwrite  --io_size=${A}M --randrepeat=0 --direct=1 --size=4096M > /dev/null
    sync

    for B in 1 2 3; do
        echo sleep ${B}0
        sleep ${B}0
        dd if=testfile of=/dev/null bs=1M iflag=direct
    done
done

echo sleep 3600
sleep 3600
dd if=testfile of=/dev/null bs=1M iflag=direct

, NVMe- intel windows; , stackexchange -

fio; exe- .

$testfile = "c:\temp\testfile"
$fio = "c:\temp\fio-3.18-x64\fio"

echo "preparing..."

$filestream = New-Object System.IO.FileStream($testfile, "Create")
$binarywriter = New-Object System.IO.BinaryWriter($filestream)
$out = new-object byte[] 1048576

For ($i=1; $i -le 4096; $i++) {
    (new-object Random).NextBytes($out);
    $binarywriter.write($out)
}
$binarywriter.Close()

For ($i=1; $i -le 3; $i++) {
    sleep 10
    $time = Measure-Command {
        Invoke-Expression "$fio --name=test1 --filename=$testfile --bs=1M --iodepth=1 --numjobs=1  --rw=read --direct=1 --size=4096M" *>$null
    }

    $seconds = $time.Minutes*60+$time.Seconds+$time.Milliseconds/1000
    echo "read in $seconds"
}

foreach ($A in 50,200,800,4000) {
    echo "fio: write ${A}M..."
    Invoke-Expression "$fio --name=test1 --filename=$testfile --bs=4k --iodepth=1 --numjobs=1  --rw=randwrite  --io_size=${A}M --randrepeat=0 --direct=1 --size=4096M" *>$null
    For ($i=10; $i -le 30; $i+=10) {
        echo "sleep $i"
        sleep $i
        $time = Measure-Command {
            Invoke-Expression "$fio --name=test1 --filename=$testfile --bs=1M --iodepth=1 --numjobs=1  --rw=read --direct=1 --size=4096M" *>$null
        }

        $seconds = $time.Minutes*60+$time.Seconds+$time.Milliseconds/1000
        echo "read in $seconds"
    }
}

rm $testfile

: , «» ( ) , ;
windows - (, , , );
( ) .

( ) 4 :

		50	+200	+800	+4000
intel S3510 SSDSC2BB480G6	10.7	10.7	10.8	10.8	10.8
toshiba XG5 KXG50ZNV512G	1.9	2.9	3.7	4.8	6.8
samsung PM963 MZQLW960HMJP	2.8	3.2	3.5	3.7	4.2
samsung PM983 MZQLB960HAJR	3.3	3.6	3.4	3.4	3.4
samsung PM981 MZVLB1T0HALR	1.8	1.8	2.1	2.5	3.5
samsung PM1725b MZPLL1T6HAJQ	1.8	1.9	2.0	2.3	2.9
micron 5200 eco	9.3	9.8	10.4	12.2	10.7
samsung PM883 MZ7LH1T9HMLT	7.9	7.9	8.1	8.1	8.0
intel P3520 (win)	5.8	5.9	6.0	6.1	5.8
intel P4500 (win)	4.2	4.2	4.3	4.4	4.3

DC ( — /); SATA, NVMe, , .

, PM981 ( , , ), — 3.5 , SATA .
, , .

: SSD , , , ( intel, , ; samsung , ).

(, - NAND flash).
XG5: , SMART >>150, — 300-400 , flash , SSD.

: mysql 100. , , «» mysql ( 600/), (>2/).

SSD

~~, : , , . , downtime ( - ). , , .~~
( ) SSD:

sync
fsfreeze -f /mountpoint
dd  if=/dev/nvme0n1p2 of=/dev/nvme0n1p2 bs=512M iflag=direct oflag=direct status=progress
fsfreeze -u /mountpoint

«» . - , , , -. «»: 100%, SSD « , TRIM» ( , TRIM, , ).
, « » .

Summary: Defragmentation may be useful for some SSDs, but not quite the same (not at all?) As for the HDD. It is important for us not only that the file is located in a continuous chain of sectors, but also that the recording in these sectors was sequential.

PS I would be grateful if readers would run the script at their place and give numbers for their SSDs, since my selection of drives is rather one-sided.

Another look at the question "Do I need defragmentation for SSD"

More articles: