Re: svn commit: r216230 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Alexander Motin Mon, 06 Dec 2010 13:12:01 -0800

On 06.12.2010 22:18, John Baldwin wrote:

On Monday, December 06, 2010 2:53:27 pm Pawel Jakub Dawidek wrote:

On Mon, Dec 06, 2010 at 08:35:36PM +0100, Ivan Voras wrote:

Please persuade me on technical grounds why ashift, a property
intended for address alignment, should not be set in this way. If your
answer is "I don't know but you are still wrong because I say so" I
will respect it and back it out but only until I/we discuss the
question with upstream ZFS developers.


No. You persuade me why changing ashift in ZFS, which, as the comment
clearly states is "device's minimum transfer size" is better and not
hackish than presenting the disk with properly configured sector size.
This can not only affect disks that still use 512 bytes sectors, but
doesn't fix the problem at all. It just works around the problem in ZFS
when configured on top of raw disks.

Both ATA and SCSI standards implemented support for different logicaland physical sector sizes. It is not a hack - it seems to be the waymanufacturers decided to go. At least on their words. IMHO hack in thissituation would be to report to GEOM some fake sector size, differentfrom one reported by device. In any way it is the main visible diskcharacteristic, independently of what it's firmware does inside.

What about other file systems? What about other GEOM classes? GELI is
great example here, as people use ZFS on top of GELI alot. GELI
integrity verification works in a way that not reporting disk sector
size properly will have huge negative performance impact. ZFS' ashift
won't change that.


I am mostly on your side here, but I wonder if GELI shouldn't prefer the
stripesize anyway?  For example, if you ran GELI on top of RAID-5 I imagine it
would be far more performant for it to use stripe-size logical blocks instead
of individual sectors for the underlying media.

The RAID-5 argument also suggests that other filesystems should probably
prefer stripe sizes to physical sector sizes when picking block sizes, etc.

Looking further I can see use even for several "stripesize" values onthat way, unrelated to logical sector size.

Let's take an example: 5 disks with 4K physical sectors in RAID5 with64K strip. We'll have three sizes to align at: 4K, 64K and 256K.Aligning to 4K allow to avoid read-modify-write on disk level; to 64K -avoid request splitting and so increase (up to double) parallel randomread performance; to 256K - significantly increase write speed byavoiding read-modify-write on RAID5.

How can it be used? We can easily align partition to the biggest of them- 256K, to give maximum chances to any file system to align properly.UFS allocates space and writes data in granularity of blocks - dependingon specific situation we may wish to increase block size to 64K, butit's quite a big value, so depends. We can safely increase fragment sizeto 4K. Also we could make UFS read-ahead and write-back code to alignI/Os in run-time to the reported blocks. Depending on situation both 64Kand 256K could be reasonable candidates for it. Sure solution issomewhat engineering (not absolute) in each case, but IMHO reasonable.

Specific usage for these values (512, 4K, 64K and 256K) depends onabilities of specific partitioning scheme and file system. Neither diskdriver nor GEOM may know what will be more usable at each next level.512 bytes is the only one critically important value in this situation;everything else is only optimization.


--
Alexander Motin
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Re: svn commit: r216230 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Reply via email to