Re: [zfs-discuss] creating a fast ZIL device for $200

sensille Wed, 26 May 2010 07:33:03 -0700

Edward Ned Harvey wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of sensille
>>
>> The basic idea: the main problem when using a HDD as a ZIL device
>> are the cache flushes in combination with the linear write pattern
>> of the ZIL. This leads to a whole rotation of the platter after
>> each write, because after the first write returns, the head is
>> already past the sector that will be written next.
>> My idea goes as follows: don't write linearly. Track the rotation
>> and write to the position the head will hit next. This might be done
>> by a re-mapping layer or integrated into ZFS. This works only because
>> ZIL device are basically write-only. Reads from this device will be
>> horribly slow.
> 
> The reason why hard drives are less effective as ZIL dedicated log devices
> compared to such things as SSD's, is because of the rotation of the hard
> drives; the physical time to seek a random block.  There may be a
> possibility to use hard drives as dedicated log devices, cheaper than SSD's
> with possibly comparable latency, if you can intelligently eliminate the
> random seek.  If you have a way to tell the hard drive "Write this data, to
> whatever block happens to be available at minimum seek time."


Thanks for rephrasing my idea :) The only thing I'd like to point out is that
ZFS doesn't do random writes on a slog, but nearly linear writes. This might
even be hurting performance more than random writes, because you always hit
the worst case of one full rotation.

> 
> For rough estimates:  Assume the drive is using Zone Density Recording, like
> this:
> http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm
> Suppose you're able to keep your hard drive head on the outer sectors.
> Suppose 1000 sectors per track (I have no idea if that's accurate, but at
> least according to the above article in the year 2000 it was ballpark
> realistic).  Suppose 10krpm.  Then the physical seek time could
> theoretically be brought down to as low as 10^-7 seconds.  Of course, that's
> not realistic - some sectors may already be used - the electronics
> themselves could be a factor - But the point remains, the physical seek time
> can be effectively eliminated.  At least in theory.  And that was the year
> 2000.

The mentioned Hitachi disk (at least the one I have in my test machine)
has 1764 sectors on head1 and 1680 sectors on head2 in the first zone, which
has 50 tracks. I'm quite sure the limiting factor is the electronics. This
disk needs the write about 140 sectors in advance. It may be that also the
servo information on the platters has to be taken into account. Other disks
don't behave that well. I tried with 1TB SATA disks, but they doesn't seem to
have any predictable timing.

>> I have done some testing and am quite enthusiastic. If I take a
>> decent SAS disk (like the Hitachi Ultrastar C10K300), I can raise
>> the synchronous write performance from 166 writes/s to about
>> 2000 writes/s (!). 2000 IOPS is more than sufficient for our
>> production environment.
> 
> Um ... Careful there.  There are many apples, oranges, and bananas to be
> compared inaccurately against each other.  When I measure IOPS of physical
> disks, with all the caches disabled, I get anywhere from 200 to 2400 for a
> single spindle disk (SAS 10k), and I get anywhere from 2000 to 6000 with a
> SSD (SATA).  Just depending on the benchmark configuration.  Because ZFS is
> doing all sorts of acceleration behind the scenes, which make the results
> vary *immensely* from some IOPS number that you look up online.

The measurement is simple: disable write cache, write on sector, when that write
returns, calculate the next optimal sector to write to, write, calculate
again... This gives a quite stable result of about 2000 writes/s or 0.5ms
average service time, single threaded. No ZFS involved, just pure disk
performance.

> 
> So you believe you can know the drive geometry, the instantaneous head
> position, and the next available physical block address in software?  No
> need for special hardware?  That's cool.  I hope there aren't any "gotchas"
> as-yet undiscovered.

Yes, I already did a mapping of several drives. I measured at least the track
length, the interleave needed between two writes and the interleave if a
track-to-track seek is involved. Of course you can always learn more about a
disk, but that's a good starting point.

--
Arne
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] creating a fast ZIL device for $200

Reply via email to