[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Nir Soffer Sat, 28 Mar 2020 12:28:00 -0700

On Sat, Mar 28, 2020 at 9:47 PM Strahil Nikolov <hunter86...@yahoo.com> wrote:
>
> On March 28, 2020 7:26:33 PM GMT+02:00, Nir Soffer <nsof...@redhat.com> wrote:
> >On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov <hunter86...@yahoo.com>
> >wrote:
> >>
> >> On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi
> ><gianluca.cec...@gmail.com> wrote:
> >> >On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov
> ><hunter86...@yahoo.com>
> >> >wrote:
> >> >
> >> >> On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
> >> >> gianluca.cec...@gmail.com> wrote:
> >> >>
> >> >>
> >> >[snip]
> >> >
> >> >>Actually it only happened with empty disk (thin provisioned) and
> >> >sudden
> >> >> >high I/O during the initial phase of install of the OS; it didn't
> >> >> >happened
> >> >> >then during normal operaton (even with 600MB/s of throughput).
> >> >>
> >> >
> >> >[snip]
> >> >
> >> >
> >> >> Hi Gianluca,
> >> >>
> >> >> Is it happening to machines with preallocated disks or on machines
> >> >with
> >> >> thin disks ?
> >> >>
> >> >> Best Regards,
> >> >> Strahil Nikolov
> >> >>
> >> >
> >> >thin provisioned. But as I have tro create many VMs with 120Gb of
> >disk
> >> >size
> >> >of which probably only a part during time will be allocated, it
> >would
> >> >be
> >> >unfeasible to make them all preallocated. I learned that thin is not
> >> >good
> >> >for block based storage domains and heavy I/O, but I would hope that
> >it
> >> >is
> >> >not the same with file based storage domains...
> >> >Thanks,
> >> >Gianluca
> >>
> >> This is normal - gluster cannot allocate fast enough the needed
> >shards (due to high IO),  so the qemu pauses  the VM until  storage  is
> >available  again .
> >
> >I don't know glusterfs internals, but I think this is very unlikely.
> >
> >For block storage thin provisioning in vdsm, vdsm is responsible for
> >allocating
> >more space, but vdsm is not in the datapath, it is monitoring the
> >allocation and
> >allocate more data when free space reaches a limit. It has no way to
> >block I/O
> >before more space is available. Gluster is in the datapath and can
> >block I/O until
> >it can process it.
> >
> >Can you explain what is the source for this theory?
> >
> >> You can think about VDO (with deduplication ) as a  PV for the  Thin
> >LVM and this way you can preallocate your VMs , while saving space
> >(deduplication, zero-block elimination  and even compression).
> >> Of  course, VDO will reduce  performance (unless  you have
> >battery-backed write cache and compression is disabled),  but  tbe
> >benefits will be alot more.
> >>
> >> Another approach is to increase the shard size - so gluster will
> >create fewer  shards,  but allocation on disk will be higher.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >> _______________________________________________
> >> Users mailing list -- users@ovirt.org
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct:
> >https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIAYGVCBDKRBX2YWWJDJ4/
> >_______________________________________________
> >Users mailing list -- users@ovirt.org
> >To unsubscribe send an email to users-le...@ovirt.org
> >Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >oVirt Code of Conduct:
> >https://www.ovirt.org/community/about/community-guidelines/
> >List Archives:
> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMVIYABLM77BRWG6LYOZJ/
>
> Hey Nir,
> You are right ... This is just a theory based on my knowledge and it might 
> not be valid.
> We nees the libvirt logs to confirm or reject  the theory, but I'm convinced 
> that is the reason.
>
> Yet,  it's quite  possible.
> Qemu tries to write to the qcow disk on gluster.
> Gluster is creating shards based of the ofset, as it was not done initially 
> (preallocated  disk  take the full size  on gluster  and all shards are 
> created  immediately). This takes time and requires  to be done on all bricks.
> As the shard size  is too small (default 64MB), gluster has to create the 
> next shard almost immediately,  but if it can't do it as fast as qemu is 
> filling it's qcow2  disk


Gluster can block the I/O until it can write the data to a new shard.
There is no reason
to return an error unless a real error happened.

Also the VMs mentioned here are using raw disks, not qcow2:

        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sda"/>
            <source
file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
                <seclabel model="dac" relabel="no" type="none"/>
            </source>
            <driver cache="none" error_policy="stop" io="threads"
name="qemu" type="raw"/>
            <alias name="ua-0a91c346-23a5-4432-8af7-ae0a28f9c208"/>
            <address bus="0" controller="0" target="0" type="drive" unit="0"/>
            <boot order="1"/>
            <serial>0a91c346-23a5-4432-8af7-ae0a28f9c208</serial>
        </disk>

Note type="raw"

>  -  qemu will get an I/O error and we know what happens there.
> Later gluster manages to create the shard(s) , and the VM is unpaused.
>
> That's why the oVirt team made all gluster-based disks to be fully 
> preallocated.

Gluster disks are thin (raw-sparse) by default just like any other
file based storage.

If this theory was correct, this would fail consistently on gluster:

1. create raw sparse image

    truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test

2. Fill image quickly with data

    dd if=/dev/zero bs=1M | tr "\0" "U" | dd
of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
iflag=fullblock oflag=direct conv=notrunc

According to your theory gluster will fail to allocate shards fast
enough and fail the I/O.

Nir
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6H6OHOTPTEFNPRG5QDAZCT2JRG4TAFO3/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Reply via email to