Hi! Thanks for sharing your interesting insights! I have no idea how the trimming works, but I can imagine if you try to trim a portion that is smaller than the managed size (assuming it does not fail) may cause the rest of the data to be migrated to a new block before the old block is trimmed. That's only a guess, as I said.
Another thing is stacking block layers: I once caused I/O errors myself by reducing the maximum I/O size of a lower layer. I was expecting that the upper layers would send small-enough chunks to the lower layers, but instead the whole request failed at the higher layer. So obviously the I/O size at the lowest layer is important. I did not play with trim, but I did lots of tests with parallel I/O and different block sizes to fund out the optimal value for some storage device. Regards, Ulrich >>> Eric Robinson <eric.robin...@psmnv.com> schrieb am 03.08.2017 um 19:52 in Nachricht <dm5pr03mb27295f164251b8ac4c6a05adfa...@dm5pr03mb2729.namprd03.prod.outlook.com> > For anyone else who has this problem, we have reduced the time required to > trim a 1.3TB volume from 3 days to 1.5 minutes. > > Initially, we had used mdraid to build a raid0 array with a 32K chunk size. > We initialized it as a drbd disk, synced it, built an lvm logical volume on > it, and created an ext4 filesystem on the volume. Creating the filesystem and > trimming it took 3 days (each time, every time, across multiple tests). > > When running lsblk -D, we noticed that the DISC-MAX value for the array was > only 32K, compared to 4GB for the SSD drive itself. We also noticed that the > number matched the chunk size. We deleted the array and built a new one with > a 4MB chunk size. The DISC-MAX value changed to 4MB, which is the max > selectable chunk size (but still way below the other DISC-MAX values shown in > lsblk -D). We realized that, when using mdadm, the DISK-MAX value ends up > matching the array chunk size. We theorized that the small DISC-MAX value was > responsible for the slow trim rate across the DRBD link. > > Instead of using mdadm to build the array, we used LVM to create a striped > logical volume and made that the backing device for drbd. Then lsblk -D > showed > a DISC-MAX size of 128MB. Creating an ext4 filesystem on it and trimming > only > took 1.5 minutes (across multiple tests). > > Somebody knowledgeable may be able to explain how DISC-MAX affects the trim > speed, and why the DISC-MAX value is different when creating the array with > mdadm versus lvm. > > -- > Eric Robinson > >> -----Original Message----- >> From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] >> Sent: Wednesday, August 02, 2017 11:36 PM >> To: users@clusterlabs.org >> Subject: [ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow! >> >> >>> Eric Robinson <eric.robin...@psmnv.com> schrieb am 02.08.2017 um >> >>> 23:20 in >> Nachricht >> <DM5PR03MB2729C66CEC1E3B8B9E297185FAB00@DM5PR03MB2729.nampr >> d03.prod.outlook.com> >> >> > 1) iotop did not show any significant io, just maybe 30k/second of >> > drbd traffic. >> > >> > 2) okay. I've never done that before. I'll give it a shot. >> > >> > 3) I'm not sure what I'm looking at there. >> >> See /usr/src/linux/Documentation/block/stat.txt ;-) I wrote an NRPE plugin >> to monitor those with performance data and verbose text output, e.g.: >> CFS_VMs-xen: [delta 120s], 1.15086 IO/s read, 60.7789 IO/s write, 0 req/s >> read merges, 0 req/s write merges, 4.53674 sec/s read, 486.231 sec/s write, >> 2.36844 ms/s read wait, 2702.19 ms/s write wait, 0 req in_flight, 115.987 > ms/s >> active, 2704.53 ms/s wait >> >> Regards, >> Ulrich >> >> > >> > -- >> > Eric Robinson >> > >> >> -----Original Message----- >> >> From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] >> >> Sent: Tuesday, August 01, 2017 11:28 PM >> >> To: users@clusterlabs.org >> >> Subject: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow! >> >> >> >> Hi! >> >> >> >> I know little about trim operations, but you could try one of these: >> >> >> >> 1) iotop to see whether some I/O is done during trimming (assuming >> >> trimming itself is not considered to be I/O) >> >> >> >> 2) Try blocktrace on the affected devices to see what's going on. >> >> It's hard >> > to >> >> set up and to extract the info you are looking for, but it provides >> >> deep insights >> >> >> >> 3) Watch /sys/block/$BDEV/stat for performance statistics. I don't >> >> know how well DRBD supports these, however (e.g. MDRAID shows no >> wait >> >> times and no busy operations, while a multipath map has it all). >> >> >> >> Regards, >> >> Ulrich >> >> >> >> >>> Eric Robinson <eric.robin...@psmnv.com> schrieb am 02.08.2017 um >> >> >>> 07:09 in >> >> Nachricht >> >> >> <DM5PR03MB27297014DF96DC01FE849A63FAB00@DM5PR03MB2729.nampr >> >> d03.prod.outlook.com> >> >> >> >> > Does anyone know why trimming a filesystem mounted on a DRBD >> volume >> >> > takes so long? I mean like three days to trim a 1.2TB filesystem. >> >> > >> >> > Here are some pertinent details: >> >> > >> >> > OS: SLES 12 SP2 >> >> > Kernel: 4.4.74-92.29 >> >> > Drives: 6 x Samsung SSD 840 Pro 512GB >> >> > RAID: 0 (mdraid) >> >> > DRBD: 9.0.8 >> >> > Protocol: C >> >> > Network: Gigabit >> >> > Utilization: 10% >> >> > Latency: < 1ms >> >> > Loss: 0% >> >> > Iperf test: 900 mbits/sec >> >> > >> >> > When I write to a non-DRBD partition, I get 400MB/sec (bypassing >> caches). >> >> > When I trim a non-DRBD partition, it completes fast. >> >> > When I write to a DRBD volume, I get 80MB/sec. >> >> > >> >> > When I trim a DRBD volume, it takes bloody ages! >> >> > >> >> > -- >> >> > Eric Robinson >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> Users mailing list: Users@clusterlabs.org >> >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> >> >> Project Home: http://www.clusterlabs.org Getting started: >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: http://bugs.clusterlabs.org >> > >> > _______________________________________________ >> > Users mailing list: Users@clusterlabs.org >> > http://lists.clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org