Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Ravishankar N Fri, 21 Jul 2017 23:14:51 -0700


On 07/21/2017 11:41 PM, yayo (j) wrote:

Hi,
Sorry for follow up again, but, checking the ovirt interface I'vefound that ovirt report the "engine" volume as an "arbiter"configuration and the "data" volume as full replicated volume. Checkthese screenshots:

This is probably some refresh bug in the UI, Sahina might be able totell you.


https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume arefull replicated:



    /Volume Name: data/
    /Type: Replicate/
    /Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/
    /Status: Started/
    /Snapshot Count: 0/
    /Number of Bricks: 1 x 3 = 3/
    /Transport-type: tcp/
    /Bricks:/
    /Brick1: gdnode01:/gluster/data/brick/
    /Brick2: gdnode02:/gluster/data/brick/
    /Brick3: gdnode04:/gluster/data/brick/
    /Options Reconfigured:/
    /nfs.disable: on/
    /performance.readdir-ahead: on/
    /transport.address-family: inet/
    /storage.owner-uid: 36/
    /performance.quick-read: off/
    /performance.read-ahead: off/
    /performance.io-cache: off/
    /performance.stat-prefetch: off/
    /performance.low-prio-threads: 32/
    /network.remote-dio: enable/
    /cluster.eager-lock: enable/
    /cluster.quorum-type: auto/
    /cluster.server-quorum-type: server/
    /cluster.data-self-heal-algorithm: full/
    /cluster.locking-scheme: granular/
    /cluster.shd-max-threads: 8/
    /cluster.shd-wait-qlength: 10000/
    /features.shard: on/
    /user.cifs: off/
    /storage.owner-gid: 36/
    /features.shard-block-size: 512MB/
    /network.ping-timeout: 30/
    /performance.strict-o-direct: on/
    /cluster.granular-entry-heal: on/
    /auth.allow: */
    /server.allow-insecure: on/





    /Volume Name: engine/
    /Type: Replicate/
    /Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
    /Status: Started/
    /Snapshot Count: 0/
    /Number of Bricks: 1 x 3 = 3/
    /Transport-type: tcp/
    /Bricks:/
    /Brick1: gdnode01:/gluster/engine/brick/
    /Brick2: gdnode02:/gluster/engine/brick/
    /Brick3: gdnode04:/gluster/engine/brick/
    /Options Reconfigured:/
    /nfs.disable: on/
    /performance.readdir-ahead: on/
    /transport.address-family: inet/
    /storage.owner-uid: 36/
    /performance.quick-read: off/
    /performance.read-ahead: off/
    /performance.io-cache: off/
    /performance.stat-prefetch: off/
    /performance.low-prio-threads: 32/
    /network.remote-dio: off/
    /cluster.eager-lock: enable/
    /cluster.quorum-type: auto/
    /cluster.server-quorum-type: server/
    /cluster.data-self-heal-algorithm: full/
    /cluster.locking-scheme: granular/
    /cluster.shd-max-threads: 8/
    /cluster.shd-wait-qlength: 10000/
    /features.shard: on/
    /user.cifs: off/
    /storage.owner-gid: 36/
    /features.shard-block-size: 512MB/
    /network.ping-timeout: 30/
    /performance.strict-o-direct: on/
    /cluster.granular-entry-heal: on/
    /auth.allow: */

          server.allow-insecure: on

2017-07-21 19:13 GMT+02:00 yayo (j) <jag...@gmail.com<mailto:jag...@gmail.com>>:


    2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishan...@redhat.com
    <mailto:ravishan...@redhat.com>>:


        But it does  say something. All these gfids of completed heals
        in the log below are the for the ones that you have given the
        getfattr output of. So what is likely happening is there is an
        intermittent connection problem between your mount and the
        brick process, leading to pending heals again after the heal
        gets completed, which is why the numbers are varying each
        time. You would need to check why that is the case.
        Hope this helps,
        Ravi


            /[2017-07-20 09:58:46.573079] I [MSGID: 108026]
            [afr-self-heal-common.c:1254:afr_log_selfheal]
            0-engine-replicate-0: Completed data selfheal on
            e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
            /[2017-07-20 09:59:22.995003] I [MSGID: 108026]
            [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
            0-engine-replicate-0: performing metadata selfheal on
            f05b9742-2771-484a-85fc-5b6974bcef81/
            /[2017-07-20 09:59:22.999372] I [MSGID: 108026]
            [afr-self-heal-common.c:1254:afr_log_selfheal]
            0-engine-replicate-0: Completed metadata selfheal on
            f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/



    Hi,

    following your suggestion, I've checked the "peer" status and I
    found that there is too many name for the hosts, I don't know if
    this can be the problem or part of it:

        /*gluster peer status on NODE01:*/
        /Number of Peers: 2/
        /
        /
        /Hostname: dnode02.localdomain.local/
        /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
        /State: Peer in Cluster (Connected)/
        /Other names:/
        /192.168.10.52/
        /dnode02.localdomain.local/
        /10.10.20.90/
        /10.10.10.20/
        /
        /
        /
        /
        /
        /
        /
        /
        */gluster peer status on //NODE02:/*
        /Number of Peers: 2/
        /
        /
        /Hostname: dnode01.localdomain.local/
        /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/
        /State: Peer in Cluster (Connected)/
        /Other names:/
        /gdnode01/
        /10.10.10.10/
        /
        /
        /Hostname: gdnode04/
        /Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828/
        /State: Peer in Cluster (Connected)/
        /Other names:/
        /192.168.10.54/
        /10.10.10.40/
        /
        /
        /*
        */
        */gluster peer status on //NODE04:/*
        /Number of Peers: 2/
        /
        /
        /Hostname: dnode02.neridom.dom/
        /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
        /State: Peer in Cluster (Connected)/
        /Other names:/
        /10.10.20.90/
        /gdnode02/
        /192.168.10.52/
        /10.10.10.20/
        /
        /
        /Hostname: dnode01.localdomain.local/
        /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/
        /State: Peer in Cluster (Connected)/
        /Other names:/
        /gdnode01/
        /10.10.10.10/

    /
    /
    /
    /
    All these ip are pingable and hosts resolvible across all 3 nodes
    but, only the 10.10.10.0 network is the decidated network for
    gluster  (rosolved using gdnode* host names) ... You think that
    remove other entries can fix the problem? So, sorry, but, how can
    I remove other entries?

I don't think having extra entries could be a problem. Did you check thefuse mount logs for disconnect messages that I referred to in the otheremail?



    And, what about the selinux?

Not sure about this. See if there are disconnect messages in the mountlogs first.

-Ravi



    Thank you





--
Linux User: 369739 http://counter.li.org

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Reply via email to