Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-20 Thread Stefan Ring
Hi, I have a very curious problem on an oVirt-like virtualization host whose storage lives on gluster (as qcow2). The problem is that of the writes done by ZFS, whose sizes according to blktrace are a mixture of 8, 16, 24, ... 256 (512 byte) blocks, sometimes the first 4KB or more, but at least t

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-20 Thread Stefan Ring
This list seems to be used for patches only. I will re-post to qemu-discuss.

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Thu, Feb 20, 2020 at 10:19 AM Stefan Ring wrote: > > Hi, > > I have a very curious problem on an oVirt-like virtualization host > whose storage lives on gluster (as qcow2). > > The problem is that of the writes done by ZFS, whose sizes according > to blktrace ar

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Mon, Feb 24, 2020 at 1:35 PM Stefan Ring wrote: > > [...]. As already stated in > the original post, the problem only occurs with multiple parallel > write requests happening. Actually I did not state that. Anyway, the corruption does not happen when I restrict the ZFS io schedul

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Mon, Feb 24, 2020 at 2:27 PM Kevin Wolf wrote: > > > There are quite a few machines running on this host, and we have not > > > experienced other problems so far. So right now, only ZFS is able to > > > trigger this for some reason. The guest has 8 virtual cores. I also > > > tried writing dire

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-25 Thread Stefan Ring
On Mon, Feb 24, 2020 at 1:35 PM Stefan Ring wrote: > > What I plan to do next is look at the block ranges being written in > the hope of finding overlaps there. Status update: I still have not found out what is actually causing this. I have not found concurrent writes to overlapping f

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-27 Thread Stefan Ring
On Tue, Feb 25, 2020 at 3:12 PM Stefan Ring wrote: > > I find many instances with the following pattern: > > current file length (= max position + size written): p > write request n writes from (p + hole_size), thus leaving a hole > request n+1 writes exactly hole_size, sta

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-27 Thread Stefan Ring
On Thu, Feb 27, 2020 at 10:12 PM Stefan Ring wrote: > Victory! I have a reproducer in the form of a plain C libgfapi client. > > However, I have not been able to trigger corruption by just executing > the simple pattern in an artificial way. Currently, I need to feed my > reproduc

Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-28 Thread Stefan Ring
On Fri, Feb 28, 2020 at 12:10 PM Kevin Wolf wrote: > > This sounds almost like two other bugs we got fixed recently (in the > QEMU file-posix driver and in the XFS kernel driver) where two write > extending the file size were in flight in parallel, but if the shorter > one completed last, instead