> And this feature is independant on whether   or not the data  is
> DMA'ed straight into the user buffer.

I suppose so, however, it seems like it would make more sense to
configure a dataset property that specifically describes the caching
policy that is desired.  When directio implies different semantics for
different filesystems, customers are going to get confused.

> The  other  feature,  is to  avoid a   bcopy by  DMAing full
> filesystem block reads straight into user buffer (and verify
> checksum after). The I/O is high latency, bcopy adds a small
> amount. The kernel memory can  be freed/reuse straight after
> the user read  completes. This is  where I ask, how much CPU
> is lost to the bcopy in workloads that benefit from DIO ?

Right, except that if we try to DMA into user buffers with ZFS there's a
bunch of other things we need the VM to do on our behalf to protect the
integrity of the kernel data that's living in user pages.  Assume you
have a high-latency I/O and you've locked some user pages for this I/O.
In a pathological case, when another thread tries to access the locked
pages and then also blocks,  it does so for the duration of the first
thread's I/O.  At that point, it seems like it might be easier to accept
the cost of the bcopy instead of blocking another thread.

I'm not even sure how to assess the impact of VM operations required to
change the permissions on the pages before we start the I/O.

> The quickest return on  investement  I see for  the  directio
> hint would be to tell ZFS to not grow the ARC when servicing
> such requests.

Perhaps if we had an option that specifies not to cache data from a
particular dataset, that would suffice.  I think you've filed a CR along
those lines already (6429855)?

