Further work on the nvme driver and more performance tests. I got a couple more cards today but it turns out only one was actually a NVME card (the Intel 750). The card itself is pretty horrible compared to the samsungs... just a really bad firmware implementation and it slows down drastically when large blocks (64K and 128K) are used, or if a lot of queues are allocated, or if commands are queued to multiple queues simultaneously. It takes a lot to actually get it to perform well.
So I definitely recommend the Samsungs over the Intel 750, at least for now. The nvme(4) manual page has been updated with some information on BIOS configuration and brands. Really there are only two readily available at a reasonable price... Samsung and Intel. And at the moment Samsung has far better firmware. The rebrands I bought turned out to not be NVME cards (they were M.2 cards with an integrated AHCI controller). The Plextor did horribly. The Kingston was a bit better. But I would not recommend either. Note that most BIOSes cannot boot from NVME cards and if they can its probably UEFI, which is a pain to setup for DragonFly. In anycase, I put up some new stats in sys02.txt below: http://apollo.backplane.com/DFlyMisc/nvme_sys01.txt http://apollo.backplane.com/DFlyMisc/nvme_sys02.txt The sys02.txt tests run all three cards simultaneously. Generally speaking I maxed out at around 535,000 IPOS in the 512 byte random seek test and the 4096 byte random seek test, and I maxed out at around 4.5 GBytes/sec reading on the bandwidth test using 32KB buffers (out of deference for the idiotic intel firmware). Also, just for the fun of it, at the end I threw 4 SSDs into the hot-swap bays and ran tests with them + the nvme cards all together. But aggregate bandwidth did not improve and aggregate IOPS dropped slightly :-). The tests were performed on a 3.4 GHz Haswell xeon, 4-core/8-thread, with 16GB of ram. The data sets were written using /dev/urandom as a source (i.e. meant to be uncompressable). These tests bring up some interesting problems that I'll have to solve for HAMMER and HAMMER2. These filesystems crc the meta-data and data blocks. The generic crc code can only do around 200 MBytes/sec per cpu core and the multi-table iscsi_crc code can only do 500 MBytes/sec per cpu core. That's a problem when the underlying storage has 1.5 GBytes/sec of bandwidth. -Matt
