Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-09 Thread Stanislav German-Evtushenko
HDD firmwares updated, "Command timeout on physical disk" error seems to be gone but out of sync blocks still appear. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-08 Thread Stanislav German-Evtushenko
Hello all, I have new information. *** 1. I found in logs on host1 this: ... Apr 6 19:14:36 host1 Server Administrator: Storage Service EventID: 2405 Command timeout on physical disk:

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-08 Thread Stanislav German-Evtushenko
Hello everybody, I have disabled barriers (no-disk-barrier) on both nodes but it didn't help. I ran "verify" daily and two of checks went well but third one showed several blocks out of sync. I have already tried everything I could think about. Can anybody guide me what can I do next? Best regard

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-04 Thread Stanislav German-Evtushenko
Hi All, No changes so far, disabling bonding didn't help. I still get out of sync blocks from time to time. So I have two questions: - Can verifying process itself be the cause of "out of sync" blocks? - Does it make any sense to disable disk-barriers and use flush instead if we use RHEL kernel 2.

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-01 Thread Stanislav German-Evtushenko
Hello, I do futher investigation. 1. All of hardware firmwares are up to date so far but nothing has changed. All of tcp offload features are disabled for all of 4 ethernet controllers. 2. I have created a small script for comparing out-of-sync blocks:

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-28 Thread Stanislav German-Evtushenko
Some new foundings. node1 has very old ethernet firmware, the plan is to update and to follow checking process. root@node1:~# ethtool -i eth1 driver: bnx2 version: 2.2.3f firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8 bus-info: :01:00.1 root@node2:~# ethtool -i eth1 driver: bnx2 version: 2.2.3f

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-27 Thread Stanislav German-Evtushenko
Update: today's check found 1 block from yesterday and 3 new blocks. Source is again node1 and dest node2 but now It's a Windows VM. And same thing again: zeros "not replicated". Non-zero part from node1 (I'm talking about 3 blocks, that means 24 sectors by 512 bytes each) completely the same as on

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-27 Thread Stanislav German-Evtushenko
I am still trying to catch the bug. Since I have disabled tx and tx checksumming offload two checks went well but the third one found one block out-of-sync again. Block includes 8 sectors for 512 bytes each but in fact only three of are different. status of content 0 - same 1 - same 2 - same 3 -

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Futher investigations: I have disabled rx and tx checksumming offloading and proceeded two cycles of verifying process - no problems at all. The only problem is that CPU usage is increased a lot. ethtool --offloading eth1 rx off tx off and ethtool --offloading eth3 rx off tx off on both nodes May

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Felix Frank
On 03/25/2013 03:50 PM, Dan Barker wrote: > You don't need Dual-Primary to live migrate - you need shared storage. Two > completely different concepts. Now, the shared storage can be based on a DRBD > Primary, and you can have it fail over to a DRBD Secondary, but dual primary > is not going to

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Dan Barker
> -Original Message- > From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- > boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko > Sent: Monday, March 25, 2013 10:13 AM > To: Radu Radutiu > Subject: Re: [DRBD-user] Uncatchable DRBD out-of-sync is

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Information by that links is deeply advanced. It's difficult even to understand how much it is related to my problem. > So you *possibly* have ongoing data corruption > caused by hardware, or layers above DRBD. Any ideas how to investigate? > Or you may just have "normal behaviour", > and if DRBD

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Thank you for suggestion. I could investigate if this is a swap region but it wouldn't help because: 1) I can check it for Linux VMs but I can't do the same for Windows ones (because swap file is in file system). 2) I can't do online migration even if only swap region is out of sync because it wil

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Maurits van de Lande
yself for this kind of SANs. Regards, Maurits -Oorspronkelijk bericht- Van: Stanislav German-Evtushenko [mailto:ginerm...@gmail.com] Verzonden: maandag 25 maart 2013 12:52 Aan: Maurits van de Lande CC: drbd-user@lists.linbit.com Onderwerp: Re: [DRBD-user] Uncatchable DRBD out-of-s

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Lars Ellenberg
On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko wrote: > Futher investigations... > > First vefification went well but then strange things started to happen. > Full logs are here: http://pastebin.com/ntbQcaNz ... "Digest mismatch, buffer modified by upper layers during write

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
What do you mean? On Mon, Mar 25, 2013 at 4:16 PM, Radu Radutiu wrote: > Are the oos regions part of the swap partitions for the VMs? > > Radu ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
> filter = [ "a|drbd|", "r|.*|" ] yes, something like this is in place > write_cache_state = 0 I have write_cache_state = 1 (default) but this is only filtering cache as I understand and /etc/lvm/cache is empty on both nodes > Do you use some custom disk caching module like flashcache? I'm not su

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Maurits van de Lande
ists.linbit.com> Onderwerp: Re: [DRBD-user] Uncatchable DRBD out-of-sync issue Hi, > there is a certain risk of data not reaching the hard disk soundly. It seems so but what is a way to catch the reason? > What's your hard disk stack? RAID 1+0 on both nodes. > If you have the freed

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Hi, > there is a certain risk of data not reaching the hard disk soundly. It seems so but what is a way to catch the reason? > What's your hard disk stack? RAID 1+0 on both nodes. > If you have the freedom, there are some things to try, e.g. > - not use ethernet bonding I can try but it will dec

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Felix Frank
You forget to CC the list... On 03/25/2013 10:22 AM, Stanislav German-Evtushenko wrote: >> there is a certain risk of data not reaching the hard disk soundly. > It seems so but what is a way to catch the reason? Really not much you can do. Not sure if ZFS is viable and helpful. >> What's your ha

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Felix Frank
Hi, there is a certain risk of data not reaching the hard disk soundly. This is electronics, bits can get flipped for the dumbest of reasons (yes, space radiation is now more dangerous to your computer than 20 years ago, for one. Or so I've heard.) What's your hard disk stack? ZFS is said to pro

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Futher investigations... First vefification went well but then strange things started to happen. Full logs are here: http://pastebin.com/ntbQcaNz Short description is bellow in historical order. **

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
* Once again with the correct email subject, sorry for the duplicate * > Stanislav, my system sends me an email when verify finds an out-of-sync condition. > You can use the same handler if you like. > In my global, handlers section: > out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemaila

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Dan Barker
ct/connect the resource)? Dan, in Atlanta From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko Sent: Sunday, March 24, 2013 7:00 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Uncatchable DRBD out-of-sync issue Dear

[DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
Dear all, I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next? Configuration: - two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration) - drbd0 master-master (size is 900GiB) - direct connection (two 1G

[DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
Dear all, I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next? Configuration: - two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration) - drbd0 master-master (size is 900GiB) - direct connection (two 1G