t...@panix.com (Thor Lancelot Simon) writes: >> We have tons of parallelism for writing and a small amount for reading.
>Unless you've done even more than I noticed, allocation in the filesystems >is going to be a bottleneck -- concurrent access not having been foremost >in anyone's mind when FFS was designed. When you write the filesystem will queue an infinite amount of data to the device, and the device will issue as many concurrent commands as it (and the target) has openings (i.e. max 256 for scsipi and thus iscsi). The parallelism is then only limited by memory and the pagermap. Reading sequentially is done by UVM with a read-ahead of (default) up to 8 * MAXPHYS (I have bumped this locally to 16 * MAXPHYS to make iscsi saturate a GigE link). Reading randomly is limited by lock contention in the kernel when you try to read with many threads. Reading is obviously also limited by the pagermap. The default size of amd64 is 16MByte and that's the amount of I/O you can have in flight. Wether filesystem allocation (and possible snychronous writes) are a limitation depends. WAPBL seems to hide that quite good. Saying this, on real fast storage (NVME on PCIe) everything seems to be CPU limited, and the largest overhead comes from UVM. I believe changing device I/O to use unmapped pages will have the largest impact. At the same time it will also avoid the pagermap limit. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."