On Jan 24, 2007, at 06:54, Roch - PAE wrote:

[EMAIL PROTECTED] writes:
Note also that for most applications, the size of their IO operations
would often not match the current page size of the buffer, causing
additional performance and scalability issues.

Thanks for mentioning this, I forgot about it.

Since ZFS's default block size is configured to be larger than a page,
the application would have to issue page-aligned block-sized I/Os.
Anyone adjusting the block size would presumably be responsible for
ensuring that the new size is a multiple of the page size.  (If they
would want Direct I/O to work...)

I believe UFS also has a similar requirement, but I've been wrong
before.


I believe the UFS requirement is that the I/O be sector
aligned for DIO to be attempted. And Anton did mention that
one of the benefit of DIO is the ability to direct-read a
subpage block. Without UFS/DIO the OS is required to read and
cache the full page and the extra amount of I/O may lead to
data channel saturation (I don't see latency as an issue in
here, right ?).

In QFS there are mount options to do automatic type switching
depending on whether or not the IO is sector aligned or not.  You
essentially set a trigger to switch to DIO if you receive a tunable
number of well aligned IO requests.  This helps tremendously in
certain streaming workloads (particularly write) to reduce overhead.

This is where I said that such a feature would translate
for ZFS into the ability to read parts of a filesystem block
which would only make sense if checksums are disabled.

would it be possible to do checksums a posteri? .. i suspect that
the checksum portion of the transaction may not be atomic though
and this leads us back towards the older notion of a DIF.

And for RAID-Z that could mean avoiding I/Os to each disks but
one in a group, so that's a nice benefit.

So  for the  performance  minded customer that can't  afford
mirroring, is not  much a fan  of data integrity, that needs
to do subblock reads to an  uncacheable workload, then I can
see a feature popping up. And this feature is independant on
whether   or not the data  is  DMA'ed straight into the user
buffer.

certain streaming write workloads that are time dependent can
fall into this category .. if i'm doing a DMA read directly from a
device's buffer that i'd like to stream - i probably want to avoid
some of the caching layers of indirection that will probably impose
more overhead.

The idea behind allowing an application to advise the filesystem
of how it plans on doing it's IO (or the state of it's own cache or
buffers or stream requirements) is to prevent the one cache fits
all sort of approach that we currently seem to have in the ARC.

The  other  feature,  is to  avoid a   bcopy by  DMAing full
filesystem block reads straight into user buffer (and verify
checksum after). The I/O is high latency, bcopy adds a small
amount. The kernel memory can  be freed/reuse straight after
the user read  completes. This is  where I ask, how much CPU
is lost to the bcopy in workloads that benefit from DIO ?

But isn't the cost more than just the bcopy?  Isn't there additional
overhead in the TLB/PTE from the page invalidation that needs
to occur when you do actually go to write the page out or flush
the page?

At this point, there are lots of projects  that will lead to
performance improvements.  The DIO benefits seems like small
change in the context of ZFS.

The quickest return on  investement  I see for  the  directio
hint would be to tell ZFS to not grow the ARC when servicing
such requests.

How about the notion of multiple ARCs that could be referenced
or fine tuned for various types of IO workload profiles to provide a
more granular approach?  Wouldn't this also keep the page tables
smaller and hopefully more contiguous for atomic operations? Not
sure what this would break ..

.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to