On Fri, Sep 23, 2016 at 09:38:08AM -0400, Greg Troxel wrote: > > Johnny Billquist <b...@softjar.se> writes: > > > With rotating rust, the order of operations can make a huge difference > > in speed. With SSDs you don't have those seek times to begin with, so > > I would expect the gains to be marginal. > > For reordering, I agree with you, but the SSD speeds are so high that > pipeling is probably necessary to keep the SSD from stalling due to not > having enough data to write. So this could help move from 300 MB/s > (that I am seeing) to 550 MB/s.
The iSCSI case is illustrative, too. Now you can have a "SCSI bus" with a huge bandwidth delay product. It doesn't matter how quickly the target says it finished one command (which is all enabling the write-cache can get you) if you are working in lockstep such that the initiator cannot send more commands until it receives the target's ack. This is why on iSCSI you really do see hundreds of tags in flight at once. You can pump up the request size, but that causes fairness problems. Keeping many commands active at the same time helps much more. Now think about that SSD again. The SSD's write latency is so low that _relative to the delay time it takes the host to issue a new command_ you have the same problem. It's clear that enabling the write cache can't really help, or at least can't help much: you need to have many commands pending at the same time. Our storage stack's inability to use tags with SATA targets is a huge gating factor for performance with real workloads (the residual use of the kernel lock at and below the bufq layer is another). Starting de novo with NVMe, where it's perverse and structurally difficult to not support multiple commands in flight simultaneously, will help some, but SATA SSDs are going to be around for a long time still and it'd be great if this limitation went away. That said, I am not going to fix it myself so all I can do is sit here and pontificate -- which is worth about what you paid for it, and no more. Thor