[DRBD-user] Copy buffer before commit, possible?

2014-10-20 Thread Stanislav German-Evtushenko
lse? Best regards, Stanislav German-Evtushenko ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-03-18 Thread Stanislav German-Evtushenko
Hello Lars, Usually I need to wait for a week to get out of sync so investigation is going slow. Could you suggest a reliable way to simulate "Digest mismatch, *buffer modified* by upper layers during write"? That would help a lot. Best regards, Stanislav _

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-03-10 Thread Stanislav German-Evtushenko
Hello Lars, > Upper layer submits write to DRBD. > DRBD calculates checksum over data buffer. > DRBD sends that checksum. > DRBD submits data buffer to "local" backend block device. > Meanwhile, upper layer changes data buffer. > DRBD sends data buffer to peer. > DRBD receives lo

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-02-24 Thread Stanislav German-Evtushenko
>> Can LVM cause these OOS? > Very unlikely. I have another idea here. I'll try to switch drive options for KVM from cache=none to cache=directsync or cache=writethrough. In that case KVM has to ensure that data is on disk. I suppose this means (with disk-flushes and disk-barrier disabled) that KV

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-02-24 Thread Stanislav German-Evtushenko
> > Most of the time (99%) I see ERR for the swap space of virtual machines. > > If you enable "integrity-alg", do you still see those "buffer modified > by upper layers during write"? > > Well, then that is your problem, > and that problem can *NOT* be fixed with DRBD "config tuning". > > What doe

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-01-29 Thread Stanislav German-Evtushenko
Just to make things clearer. These results are not false-positive, they are real. False-positive also happen but rarely. I do check for false-positive using the following script: -- #!/bin/bash # Usage: cat /var/log/kern.log | drbd_out_of_syn

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-01-27 Thread Stanislav German-Evtushenko
> > > Have you figured out on which one of the servers the data is correct? And > is > it always the same server? This assumes a primary/secondary setup. > If you know on which server the data is correct then you know - IF it's a > hardware problem - which server is at fault. If it's a software pro

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-01-27 Thread Stanislav German-Evtushenko
oot cause of your problems I guess… > > > > Regards, > > > > Pascal. > > > > *De :* drbd-user-boun...@lists.linbit.com [mailto: > drbd-user-boun...@lists.linbit.com] *De la part de* Stanislav > German-Evtushenko > *Envoyé :* lundi 27 janvier 2014 13:

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-01-27 Thread Stanislav German-Evtushenko
On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Hi, > > Just jumping in, unaware of the history of this thread... > > Stanislav German-Evtushenko wrote, on 27-1-2014 7:08: > > > > On Thu, Apr 1

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2014-01-26 Thread Stanislav German-Evtushenko
On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko < ginerm...@gmail.com> wrote: > No choice so far :) > http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3 > > I don't think this is a kernel bug. Anyway would be nice if sombody > can investigate and fix or at l

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2013-04-18 Thread Stanislav German-Evtushenko
/18/2013 12:20 PM, Stanislav German-Evtushenko wrote: >>> Note that your kernel (and hence kvm/virtio) can be considered rather old >>> by now. >> This is a stable RHEL 6 kernel at the moment. > > Exactly ;-) > > Same for Debian 6, which I no longer consider fit

Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2013-04-18 Thread Stanislav German-Evtushenko
> Note that your kernel (and hence kvm/virtio) can be considered rather old by > now. This is a stable RHEL 6 kernel at the moment. On Thu, Apr 18, 2013 at 1:16 PM, Felix Frank wrote: > On 04/18/2013 08:26 AM, Stanislav German-Evtushenko wrote: >> If I change VIRTIO to IDE

[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

2013-04-18 Thread Stanislav German-Evtushenko
Beginning is here: http://www.gossamer-threads.com/lists/drbd/users/25146 Hello everybody, Finally I think I can reproduce the issue. When it happens: Linux kernel: 2.6.32-19-pve (based on vzkernel-2.6.32-042stab075.2.src.rpm) DRBD Version: 8.3.13 DRBD Mode: dual primary + LVM on top of DRBD Vir

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-09 Thread Stanislav German-Evtushenko
HDD firmwares updated, "Command timeout on physical disk" error seems to be gone but out of sync blocks still appear. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-08 Thread Stanislav German-Evtushenko
Hello all, I have new information. *** 1. I found in logs on host1 this: ... Apr 6 19:14:36 host1 Server Administrator: Storage Service EventID: 2405 Command timeout on physical disk:

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-08 Thread Stanislav German-Evtushenko
Hello everybody, I have disabled barriers (no-disk-barrier) on both nodes but it didn't help. I ran "verify" daily and two of checks went well but third one showed several blocks out of sync. I have already tried everything I could think about. Can anybody guide me what can I do next? Best regard

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-04 Thread Stanislav German-Evtushenko
Hi All, No changes so far, disabling bonding didn't help. I still get out of sync blocks from time to time. So I have two questions: - Can verifying process itself be the cause of "out of sync" blocks? - Does it make any sense to disable disk-barriers and use flush instead if we use RHEL kernel 2.

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-04-01 Thread Stanislav German-Evtushenko
Hello, I do futher investigation. 1. All of hardware firmwares are up to date so far but nothing has changed. All of tcp offload features are disabled for all of 4 ethernet controllers. 2. I have created a small script for comparing out-of-sync blocks:

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-28 Thread Stanislav German-Evtushenko
Some new foundings. node1 has very old ethernet firmware, the plan is to update and to follow checking process. root@node1:~# ethtool -i eth1 driver: bnx2 version: 2.2.3f firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8 bus-info: :01:00.1 root@node2:~# ethtool -i eth1 driver: bnx2 version: 2.2.3f

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-27 Thread Stanislav German-Evtushenko
Update: today's check found 1 block from yesterday and 3 new blocks. Source is again node1 and dest node2 but now It's a Windows VM. And same thing again: zeros "not replicated". Non-zero part from node1 (I'm talking about 3 blocks, that means 24 sectors by 512 bytes each) completely the same as on

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-27 Thread Stanislav German-Evtushenko
I am still trying to catch the bug. Since I have disabled tx and tx checksumming offload two checks went well but the third one found one block out-of-sync again. Block includes 8 sectors for 512 bytes each but in fact only three of are different. status of content 0 - same 1 - same 2 - same 3 -

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Futher investigations: I have disabled rx and tx checksumming offloading and proceeded two cycles of verifying process - no problems at all. The only problem is that CPU usage is increased a lot. ethtool --offloading eth1 rx off tx off and ethtool --offloading eth3 rx off tx off on both nodes May

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
l I stopped all VMs on one node made the resource on this node secondary, disconnected, and connected again. Best regards, Stanislav On Mon, Mar 25, 2013 at 5:07 PM, Lars Ellenberg wrote: > On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko wrote: >> Futher investigations.

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Thank you for suggestion. I could investigate if this is a swap region but it wouldn't help because: 1) I can check it for Linux VMs but I can't do the same for Windows ones (because swap file is in file system). 2) I can't do online migration even if only swap region is out of sync because it wil

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
What do you mean? On Mon, Mar 25, 2013 at 4:16 PM, Radu Radutiu wrote: > Are the oos regions part of the swap partitions for the VMs? > > Radu ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
> filter = [ "a|drbd|", "r|.*|" ] yes, something like this is in place > write_cache_state = 0 I have write_cache_state = 1 (default) but this is only filtering cache as I understand and /etc/lvm/cache is empty on both nodes > Do you use some custom disk caching module like flashcache? I'm not su

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Hi, > there is a certain risk of data not reaching the hard disk soundly. It seems so but what is a way to catch the reason? > What's your hard disk stack? RAID 1+0 on both nodes. > If you have the freedom, there are some things to try, e.g. > - not use ethernet bonding I can try but it will dec

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Stanislav German-Evtushenko
Futher investigations... First vefification went well but then strange things started to happen. Full logs are here: http://pastebin.com/ntbQcaNz Short description is bellow in historical order. **

Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
* Once again with the correct email subject, sorry for the duplicate * > Stanislav, my system sends me an email when verify finds an out-of-sync condition. > You can use the same handler if you like. > In my global, handlers section: > out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemaila

Re: [DRBD-user] drbd-user Digest, Vol 104, Issue 28

2013-03-24 Thread Stanislav German-Evtushenko
> Stanislav, my system sends me an email when verify finds an out-of-sync condition. You can use the same handler if you like. > In my global, handlers section: > out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemailaddress"; I did it some time ago, so I get the message each time it verifi

[DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
Dear all, I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next? Configuration: - two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration) - drbd0 master-master (size is 900GiB) - direct connection (two 1G

[DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Stanislav German-Evtushenko
Dear all, I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next? Configuration: - two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration) - drbd0 master-master (size is 900GiB) - direct connection (two 1G