On 7/21/21 3:18 AM, Vivek Goyal wrote:
> On Tue, Jul 20, 2021 at 01:25:11PM +0800, JeffleXu wrote:
>>
>>
>> On 7/20/21 5:30 AM, Vivek Goyal wrote:
>>> On Fri, Jul 16, 2021 at 06:47:49PM +0800, Jeffle Xu wrote:
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> There are three related scenarios:
>>>> 1. Alloc inode: get per-file DAX flag from fuse_attr.flags. (patch 3)
>>>> 2. Per-file DAX flag changes when the file has been opened. (patch 3)
>>>> In this case, the dentry and inode are all marked as DONT_CACHE, and
>>>> the DAX state won't be updated until the file is closed and reopened
>>>> later.
>>>> 3. Users can change the per-file DAX flag inside the guest by chattr(1).
>>>> (patch 4)
>>>> 4. Create new files under directories with DAX enabled. When creating
>>>> new files in ext4/xfs on host, the new created files will inherit the
>>>> per-file DAX flag from the directory, and thus the new created files in
>>>> virtiofs will also inherit the per-file DAX flag if the fuse server
>>>> derives fuse_attr.flags from the underlying ext4/xfs inode's per-file
>>>> DAX flag.
>>>
>>> Thinking little bit more about this from requirement perspective. I think
>>> we are trying to address two use cases here.
>>>
>>> A. Client does not know which files DAX should be used on. Only server
>>>    knows it and server passes this information to client. I suspect
>>>    that's your primary use case.
>>
>> Yes, this is the starting point of this patch set.
>>
>>>
>>> B. Client is driving which files are supposed to be using DAX. This is
>>>    exactly same as the model ext4/xfs are using by storing a persistent
>>>    flag on inode. 
>>>
>>> Current patches seem to be a hybrid of both approach A and B. 
>>>
>>> If we were to implement B, then fuse client probably needs to have the
>>> capability to query FS_XFLAG_DAX on inode and decide whether to
>>> enable DAX or not. (Without extra round trip). Or know it indirectly
>>> by extending GETATTR and requesting this explicitly.
>>
>> If guest queries if the file is DAX capable or not by an extra GETATTR,
>> I'm afraid this will add extra round trip.
>>
>> Or if we add the DAX flag (ATTR_DAX) by extending LOOKUP, as done by
>> this patch set, then the FUSE server needs to set ATTR_DAX according to
>> the FS_XFLAG_DAX of the backend files, *to make the FS_XFLAG_DAX flag
>> previously set by FUSE client work*. Then it becomes a *mandatory*
>> requirement when implementing FUSE server. i.e., it becomes part of the
>> FUSE protocol rather than implementation specific. FUSE server can still
>> implement some other algorithm deciding whether to set ATTR_DAX or not,
>> though it must set ATTR_DAX once the backend file is flagged with
>> FS_XFLAG_DAX.
>>
>> Besides, as you said, FUSE server needs to add one extra
>> GETFLAGS/FSGETXATTR ioctl per LOOKUP in this case. To eliminate this
>> cost, we could negotiate the per-file DAX capability during FUSE_INIT.
>> Only when the per-file DAX capability is negotiated, will the FUSE
>> server do extra GETFLAGS/FSGETXATTR ioctl and set ATTR_DAX flag when
>> doing LOOKUP.
>>
>>
>> Personally, I prefer the second way, i.e., by extending LOOKUP (adding
>> ATTR_DAX), to eliminate the extra round trip.
> 
> Negotiating a fuse feature say FUSE_FS_XFLAG_DAX makes sense. If
> client is mounted with "-o dax=inode", then client will negotitate
> this feature and if server does not support it, mount can fail.
> 
> But this probably will be binding on server that it needs to return
> the state of FS_XFLAG_DAX attr on inode upon lookup/getattr. I don't
> this will allow server to implement its own separate policy which
> does not follow FS_XFLAG_DAX xattr. 

That means the backend fs must be ext4/xfs supporting per-file DAX feature.

If given more right to construct its own policy, FUSE server could
support per-file DAX upon other backend fs, though it will always fail
when virtiofs wants to set FS_XFLAG_DAX inside guest.

> 
> IOW, I don't think server can choose to implement its own policy
> for enabling dax for "-o dax=inode".
> 
> If there is really a need to for something new where server needs
> to dynamically decide which inodes should use dax (and not use
> FS_XFLAG_FS), I guess that probably should be a separate mount
> option say "-o dax=server" and it negotiates a separate feature
> say FUSE_DAX_SERVER. Once that's negotiated, now both client
> and server know that DAX will be used on files as determined
> by server and not necessarily by using file attr FS_XFLAG_DAX.
> 
> So is "dax=inode" enough for your needs? What's your requirement,
> can you give little bit of more details.

In our use case, the backend fs is something like SquashFS on host. The
content of the file on host is downloaded *as needed*. When the file is
not completely ready (completely downloaded), the guest will follow the
normal IO routine, i.e., by FUSE_READ/FUSE_WRITE request. While the file
is completely ready, per-file DAX is enabled for this file. IOW the FUSE
server need to dynamically decide if per-file DAX shall be enabled,
depending on if the file is completely downloaded.

Our strategy and design is still under discussion and may change. Any
comment and discussion is welcomed.

> 
>>
>>>
>>> If we were only implementing A, then server does not have a way to 
>>> tell client to enable DAX. Server can either look at FS_XFLAG_DAX
>>> and decide to enable DAX or use some other property. Given querying
>>> FS_XFLAG_DAX will be an extra ioctl() on every inode lookup/getattr,
>>> it probably will be a server option. But enabling on server does not
>>> mean client will enable it.
>>
>> As I said previously, it can be negotiated whether this per-file DAX
>> capability is needed. Guest can advertise this capability when '-o
>> dax=inode' is configured.
>>
>>>
>>> I think my primary concern with this patch right now is trying to
>>> figure out which requirement we are trying to cater to first and how
>>> to connect server and client well so they both understand what mode
>>> they are operating in and interact well.
>>>
>>
>>
>> -- 
>> Thanks,
>> Jeffle
>>

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to