Federico, thank you for your help so far.  Lots of more information below.

On 1/27/2014 4:46 PM, Federico Simoncelli wrote:
----- Original Message -----
From: "Ted Miller" <tmil...@hcjb.org>

On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
Maybe someone from gluster can identify easily what happened. Meanwhile if
you just want to repair your data-center you could try with:

   $ cd 
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
   $ touch ids
   $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576

I tried your suggestion, and it helped, but it was not enough.

   [root@office4a ~]$ cd
   
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/

   [root@office4a dom_md]$ touch ids

   [root@office4a dom_md]$ sanlock direct init -s 
0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576

   init done 0

Let me explain a little.

When the problem originally happened, the sanlock.log started having -223 error messages. 10 seconds later the log switched from -223 messages to -90 messages. Running your little script changed the error from -90 back to -223.

I hope you can send me another script that will get rid of the -223 messages.

Here is the sanlock.log as I ran your script:

   2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

   2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0 
0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

   2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader offset 512 
rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result -90

   2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

   2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0 
0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

   2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader offset 512 
rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result -90

   2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

   2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin error 
-223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   2014-01-27 19:40:56-0500 39296 [22866]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 
og 0 lv 0

   2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn  rn  ts 0 cs 0

   2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result -223

   2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

   2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin error 
-223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   2014-01-27 19:40:57-0500 39297 [22870]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 
og 0 lv 0

   2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn  rn  ts 0 cs 0

   2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result -223

   2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

Unfortunately, I think the error looks about the same to vdsm, because /var/log/messages shows the same two lines in the calling scripts on the callback lists (66 & 425, if I remember right).

When I get up in the morning, I will be looking for another magic potion from your pen. :)

Federico,

I won't be able to do anything to the ovirt setup for another 5 hours or so
(it is a trial system I am working on  at home, I am at work), but I will try
your repair script and report back.

In bugzilla 862975 they suggested turning off write-behind caching and "eager
locking" on the gluster volume to avoid/reduce the problems that come from
many different computers all writing to the same file(s) on a very frequent
basis.  If I interpret the comment in the bug correctly, it did seem to help
in that situation.  My situation is a little different.  My gluster setup is
replicate only, replica 3 (though there are only two hosts).  I was not
stress-testing it, I was just using it, trying to figure out how I can import
some old VMWare VMs without an ESXi server to run them on.
Have you done anything similar to what is described here in comment 21?

https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21

I have no idea, because that bug is so dangerous that the general public is not allowed to even see it. I get an "You are not authorized to access ...." message. :) However, see just below.

When did you realize that you weren't able to use the data-center anymore?
Can you describe exactly what you did and what happened, for example:

1. I created the data center (up and running)
2. I tried to import some VMs from VMWare
3. During the import (or after it) the data-center went in the contending state
...

Did something special happened?

It took me a few hours of going back through and matching times on logs to figure it out, as it has been a week since it happened. When I saw some things in the logs, I finally remembered the circumstances. I will summarize it for you, because I think this is what matters. If you want actual logs, tell me which logs and for which events, and I will package them up for you.

The host suspended all VMs because gluster hit the 90% full limit. It took me a while to figure out what was going on. I expanded the logical volume, but the Data Center would not come back up.

The GUI events log says:

   Invalid status on Data Center mill. Setting Data Center status to Non 
Responsive (On host office4a, Error: Network error during communications with 
the Host.).


The host /var/log/messages says:

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: leader1 delta_renew error -223 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: s123 delta_renew verify_leader error -223

   Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 
[30933]: s123 renewal error -223 delta_length 0 last_success 521369

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[3362]: s123 check_our_lease corrupt -223

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[3362]: s123 kill 32599 sig 15 count 1

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader1 delta_renew error -223 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: s123 delta_renew verify_leader error -223

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: s123 renewal error -223 delta_length 0 last_success 521369

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader1 delta_renew error -223 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: s123 delta_renew verify_leader error -223

   Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 
[30933]: s123 renewal error -223 delta_length 0 last_success 521369

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[3362]: s123 kill 32599 sig 15 count 2

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: leader1 delta_renew error -223 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 
[30933]: s123 delta_renew verify_leader error -223

   [snip many repeats]

   Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 
[30933]: s123 delta_renew verify_leader error -223

   Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 
[30933]: s123 renewal error -223 delta_length 0 last_success 521369

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[3362]: s123 kill 32599 sig 15 count 10

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[3362]: dead 32599 ci 4 count 10

   Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm --pidfile 
/var/run/vdsm/vdsmd.pid' died, respawning slave

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[3362]: s123 all pids clear

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: verify_leader 2 wrong magic 0 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: leader1 delta_renew error -223 lockspace 
0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: leader2 path 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
 offset 0

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: leader4 sn  rn  ts 0 cs 0

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: s123 delta_renew verify_leader error -223

   Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 
[30933]: s123 renewal error -223 delta_length 0 last_success 521369

   Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc server 
module. Please make sure it is installed.

   Jan 22 00:44:59 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': 
'0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

   Jan 22 00:44:59 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': 
'0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

   Jan 22 00:44:59 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': 
{'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

   Jan 22 00:45:00 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by 
libvirt vm

   Jan 22 00:45:00 office4a kernel: ata1: hard resetting link

   Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)

   Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133

   Jan 22 00:45:00 office4a kernel: ata1: EH complete

   Jan 22 00:45:01 office4a kernel: ata2: hard resetting link

   Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)

   Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133

   Jan 22 00:45:01 office4a kernel: ata2: EH complete

   Jan 22 00:45:02 office4a kernel: ata3: hard resetting link

   Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)

   Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133

   Jan 22 00:45:02 office4a kernel: ata3: EH complete

   Jan 22 00:45:03 office4a kernel: ata4: hard resetting link

   Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0 SControl 
300)

   Jan 22 00:45:03 office4a kernel: ata4: EH complete

   Jan 22 00:45:04 office4a kernel: ata5: soft resetting link

   Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33

   Jan 22 00:45:04 office4a kernel: ata5: EH complete

   Jan 22 00:45:05 office4a kernel: ata6: soft resetting link

   Jan 22 00:45:05 office4a kernel: ata6: EH complete

   Jan 22 00:45:08 office4a kernel: ata1: hard resetting link

   Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)

   Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133

   Jan 22 00:45:08 office4a kernel: ata1: EH complete

   Jan 22 00:45:09 office4a kernel: ata2: hard resetting link

   Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)

   Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133

   Jan 22 00:45:09 office4a kernel: ata2: EH complete

   Jan 22 00:45:10 office4a kernel: ata3: hard resetting link

   Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)

   Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133

   Jan 22 00:45:10 office4a kernel: ata3: EH complete

   Jan 22 00:45:11 office4a kernel: ata4: hard resetting link

   Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0 SControl 
300)

   Jan 22 00:45:11 office4a kernel: ata4: EH complete

   Jan 22 00:45:12 office4a kernel: ata5: soft resetting link

   Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33

   Jan 22 00:45:12 office4a kernel: ata5: EH complete

   Jan 22 00:45:13 office4a kernel: ata6: soft resetting link

   Jan 22 00:45:13 office4a kernel: ata6: EH complete

   Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 521416 
[17150]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 521417 
[3367]: s124 add_lockspace fail result -90

   Jan 22 00:45:20 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': 
'0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

   Jan 22 00:45:20 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': 
'0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

   Jan 22 00:45:20 office4a vdsm vm.Vm WARNING 
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: 
'{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': 
{'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

   Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 521422 
[17199]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 521423 
[3368]: s125 add_lockspace fail result -90

   Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR Task=`2fefe6f9-4dac-45f2-b237-0104ce890146`::Unexpected 
error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    
return fn(*args, **kargs)#012  File "/usr/share/vdsm/storage/task.py", line 318, in run#012    return 
self.cmd(*self.argslist, **self.argsdict)#012  File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012    
self.masterDomain.acquireHostId(self.id)#012  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012    
self._clusterLock.acquireHostId(hostId, async)#012  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in 
acquireHostId#012    raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: 
('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long'))

   Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 521426 
[17243]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 521427 
[3368]: s126 add_lockspace fail result -90

   Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR Task=`8f8931ba-a9f6-4225-a951-2d608b3b1d70`::Unexpected 
error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in 
_run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012    res 
= f(*args, **kwargs)#012  File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012    
allTasksStatus = sp.getAllTasksStatuses()#012  File "/usr/share/vdsm/storage/securable.py", line 66, in 
wrapper#012    raise SecureError()#012SecureError

   Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 521437 
[17334]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 521438 
[3368]: s127 add_lockspace fail result -90

   Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR Task=`6b7da394-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected 
error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    
return fn(*args, **kargs)#012  File "/usr/share/vdsm/storage/task.py", line 318, in run#012    return 
self.cmd(*self.argslist, **self.argsdict)#012  File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012    
self.masterDomain.acquireHostId(self.id)#012  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012    
self._clusterLock.acquireHostId(hostId, async)#012  File "/usr/share/vdsm/storage/clusterlock.py", line 185, in 
acquireHostId#012    raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: 
('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock lockspace add failure', 'Operation now in progress'))

   Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 521447 
[17410]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 521448 
[3367]: s128 add_lockspace fail result -90

   Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR Task=`c2486380-a0c8-45d1-a63f-e0e660f1e118`::Unexpected 
error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in 
_run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012    res 
= f(*args, **kwargs)#012  File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012    
allTasksStatus = sp.getAllTasksStatuses()#012  File "/usr/share/vdsm/storage/securable.py", line 66, in 
wrapper#012    raise SecureError()#012SecureError

   Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 521451 
[17457]: read_sectors delta_leader offset 512 rv -90 
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

   Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 521452 
[3368]: s129 add_lockspace fail result -90


I am guessing that what makes cluster storage have the (Master) designation
is that this is the one that actually contains the sanlocks?  If so, would it
make sense to set up a gluster volume to be (Master), but not use it for VM
storage, just for storing the sanlock info?  Separate gluster volume(s) could
then have the VMs on it(them), and would not need the optimizations turned
off.
Any domain must be able to become the master at any time. Without a master
the data center is unusable (at the present time), that's why we migrate (or
reconstruct) it on another domain when necessary.
Ted Miller

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to