On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhan...@redhat.com> wrote:
> OK, could you try the following: > > i. Set network.remote-dio to off > # gluster volume set <VOL> network.remote-dio off > > ii. Set performance.strict-o-direct to on > # gluster volume set <VOL> performance.strict-o-direct on > > iii. Stop the affected vm(s) and start again > > and tell me if you notice any improvement? > > Previous instll I had issue with is still on gluster 3.7.11 My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all. Keep getting some type of UI error 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@3837) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@20) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@18) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@19) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@48) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@51) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@101) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@161) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@31) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@49) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@438) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@40) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10384) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@137) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@65) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@29) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@54 ) > -Krutika > > On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samp...@neutraali.net> > wrote: > >> Hi, >> >> > On 25 Jul 2016, at 12:34, David Gossage <dgoss...@carouselchecks.com> >> wrote: >> > >> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhan...@redhat.com> >> wrote: >> > Hi, >> > >> > Thanks for the logs. So I have identified one issue from the logs for >> which the fix is this: http://review.gluster.org/#/c/14669/. Because of >> a bug in the code, ENOENT was getting converted to EPERM and being >> propagated up the stack causing the reads to bail out early with 'Operation >> not permitted' errors. >> > I still need to find out two things: >> > i) why there was a readv() sent on a non-existent (ENOENT) file (this >> is important since some of the other users have not faced or reported this >> issue on gluster-users with 3.7.13) >> > ii) need to see if there's a way to work around this issue. >> > >> > Do you mind sharing the steps needed to be executed to run into this >> issue? This is so that we can apply our patches, test and ensure they fix >> the problem. >> >> >> Unfortunately I can’t test this right away nor give exact steps how to >> test this. This is just a theory but please correct me if you see some >> mistakes. >> >> oVirt uses cache=none settings for VM’s by default which requires direct >> I/O. oVirt also uses dd with iflag=direct to check that storage has direct >> I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks >> running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 >> and problems exist at least with version .12 and .13. There has been some >> posts saying that GlusterFS 3.8.x is also affected. >> >> Steps to reproduce: >> 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. >> 2. GlusterFS is upgraded to 3.7.12+ >> 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. >> oVirt uses to check storage connection with command "dd >> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox >> iflag=direct,fullblock count=1 bs=1024000”) >> >> Please let me know if you need more information. >> >> -samuli >> >> > Well after upgrade of gluster all I did was start ovirt hosts up which >> launched and started their ha-agent and broker processes. I don't believe >> I started getting any errors till it mounted GLUSTER1. I had enabled >> sharding but had no sharded disk images yet. Not sure if the check for >> shards would have caused that. Unfortunately I can't just update this >> cluster and try and see what caused it as it has sme VM's users expect to >> be available in few hours. >> > >> > I can see if I can get my test setup to recreate it. I think I'll need >> to de-activate data center so I can detach the storage thats on xfs and >> attach the one thats over zfs with sharding enabled. My test is 3 bricks >> on same local machine, with 3 different volumes but I think im running into >> sanlock issue or something as it won't mount more than one volume that was >> created locally. >> > >> > >> > -Krutika >> > >> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < >> dgoss...@carouselchecks.com> wrote: >> > Trimmed out the logs to just about when I was shutting down ovirt >> servers for updates which was 14:30 UTC 2016-07-09 >> > >> > Pre-update settings were >> > >> > Volume Name: GLUSTER1 >> > Type: Replicate >> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >> > Status: Started >> > Number of Bricks: 1 x 3 = 3 >> > Transport-type: tcp >> > Bricks: >> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 >> > Options Reconfigured: >> > performance.readdir-ahead: on >> > storage.owner-uid: 36 >> > storage.owner-gid: 36 >> > performance.quick-read: off >> > performance.read-ahead: off >> > performance.io-cache: off >> > performance.stat-prefetch: off >> > cluster.eager-lock: enable >> > network.remote-dio: enable >> > cluster.quorum-type: auto >> > cluster.server-quorum-type: server >> > server.allow-insecure: on >> > cluster.self-heal-window-size: 1024 >> > cluster.background-self-heal-count: 16 >> > performance.strict-write-ordering: off >> > nfs.disable: on >> > nfs.addr-namelookup: off >> > nfs.enable-ino32: off >> > >> > At the time of updates ccgl3 was offline from bad nic on server but had >> been so for about a week with no issues in volume >> > >> > Shortly after update I added these settings to enable sharding but did >> not as of yet have any VM images sharded. >> > features.shard-block-size: 64MB >> > features.shard: on >> > >> > >> > >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhan...@redhat.com> >> wrote: >> > Hi David, >> > >> > Could you also share the brick logs from the affected volume? They're >> located at >> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log. >> > >> > Also, could you share the volume configuration (output of `gluster >> volume info <VOL>`) for the affected volume(s) AND at the time you actually >> saw this issue? >> > >> > -Krutika >> > >> > >> > >> > >> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < >> dgoss...@carouselchecks.com> wrote: >> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romra...@gmail.com> wrote: >> > Hi David, >> > >> > My backend storage is ZFS. >> > >> > I thought about moving from FUSE to NFS mounts for my Gluster volumes >> to help test. But since I use hosted engine this would be a real pain. >> Its difficult to modify the storage domain type/path in the >> hosted-engine.conf. And I don't want to go through the process of >> re-deploying hosted engine. >> > >> > >> > I found this >> > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 >> > >> > Not sure if related. >> > >> > But I also have zfs backend, another user in gluster mailing list had >> issues and used zfs backend although she used proxmox and got it working by >> changing disk to writeback cache I think it was. >> > >> > I also use hosted engine, but I run my gluster volume for HE actually >> on a LVM separate from zfs on xfs and if i recall it did not have the >> issues my gluster on zfs did. I'm wondering now if the issue was zfs >> settings. >> > >> > Hopefully should have a test machone up soon I can play around with >> more. >> > >> > Scott >> > >> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < >> dgoss...@carouselchecks.com> wrote: >> > What back end storage do you run gluster on? xfs/zfs/ext4 etc? >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romra...@gmail.com> wrote: >> > I get similar problems with oVirt 4.0.1 and hosted engine. After >> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the >> following: >> > >> > $ sudo hosted-engine --set-maintenance --mode=none >> > Traceback (most recent call last): >> > File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main >> > "__main__", fname, loader, pkg_name) >> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code >> > exec code in run_globals >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >> line 73, in <module> >> > if not maintenance.set_mode(sys.argv[1]): >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >> line 61, in set_mode >> > value=m_global, >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 259, in set_maintenance_mode >> > str(value)) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 204, in set_global_md_flag >> > all_stats = broker.get_stats_from_storage(service) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 232, in get_stats_from_storage >> > result = self._checked_communicate(request) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 260, in _checked_communicate >> > .format(message or response)) >> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: >> failed to read metadata: [Errno 1] Operation not permitted >> > >> > If I only upgrade one host, then things will continue to work but my >> nodes are constantly healing shards. My logs are also flooded with: >> > >> > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274714: READ => -1 gfid=4 >> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >> permitted) >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >> operation failed [Operation not permitted]" repeated 6 times between >> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >> operation failed [Operation not permitted]" repeated 8 times between >> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >> operation failed [Operation not permitted]" repeated 7 times between >> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] >> > [2016-07-21 13:15:24.134647] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134764] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274741: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >> permitted) >> > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274756: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >> permitted) >> > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274818: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >> permitted) >> > [2016-07-21 13:15:54.133582] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274853: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >> permitted) >> > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274879: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >> permitted) >> > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274894: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >> permitted) >> > >> > Scott >> > >> > >> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < >> f.rothenst...@bodden-kliniken.de> wrote: >> > Hey Devid, >> > >> > I have the very same problem on my test-cluster, despite on running >> ovirt 4.0. >> > If you access your volumes via NFS all is fine, problem is FUSE. I >> stayed on 3.7.13, but have no solution yet, now I use NFS. >> > >> > Frank >> > >> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >> >> Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I >> am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but >> have been told by users on gluster mail list due to some gluster changes >> I'd need to change the disk parameters to use writeback cache. Something >> to do with aio support being removed. >> >> >> >> I believe this could be done with custom parameters? But I believe >> strage tests are done using dd and would they fail with current settings >> then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability >> isues where gluster storage would go into down state and always show N/A as >> space available/used. Even if hosts saw storage still and VM's were >> running on it on all 3 hosts. >> >> >> >> Saw a lot of messages like these that went away once gluster rollback >> finished >> >> >> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >> 7.22 >> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> >> >> >> David Gossage >> >> Carousel Checks Inc. | System Administrator >> >> Office 708.613.2284 >> >> _______________________________________________ >> >> Users mailing list >> >> >> >> Users@ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > >> ______________________________________________________________________________ >> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >> > Sandhufe 2 >> > 18311 Ribnitz-Damgarten >> > >> > Telefon: 03821-700-0 >> > Fax: 03821-700-240 >> > >> > E-Mail: i...@bodden-kliniken.de Internet: >> http://www.bodden-kliniken.de >> > >> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: >> 079/133/40188 >> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko >> Milski >> > >> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten >> Adressaten bestimmt. Wenn Sie nicht der vorge- >> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, >> beachten Sie bitte, dass jede Form der Veröf- >> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail >> unzulässig ist. Wir bitten Sie, sofort den >> > Absender zu informieren und die E-Mail zu löschen. >> > >> > >> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users