lse?
Best regards,
Stanislav German-Evtushenko
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Hello Lars,
Usually I need to wait for a week to get out of sync so investigation is
going slow.
Could you suggest a reliable way to simulate "Digest mismatch, *buffer
modified* by upper layers during write"? That would help a lot.
Best regards,
Stanislav
_
Hello Lars,
> Upper layer submits write to DRBD.
> DRBD calculates checksum over data buffer.
> DRBD sends that checksum.
> DRBD submits data buffer to "local" backend block device.
> Meanwhile, upper layer changes data buffer.
> DRBD sends data buffer to peer.
> DRBD receives lo
>> Can LVM cause these OOS?
> Very unlikely.
I have another idea here. I'll try to switch drive options for KVM from
cache=none to cache=directsync or cache=writethrough. In that case KVM has
to ensure that data is on disk. I suppose this means (with disk-flushes and
disk-barrier disabled) that KV
> > Most of the time (99%) I see ERR for the swap space of virtual machines.
>
> If you enable "integrity-alg", do you still see those "buffer modified
> by upper layers during write"?
>
> Well, then that is your problem,
> and that problem can *NOT* be fixed with DRBD "config tuning".
>
> What doe
Just to make things clearer. These results are not false-positive, they are
real. False-positive also happen but rarely. I do check for false-positive
using the following script:
--
#!/bin/bash
# Usage: cat /var/log/kern.log | drbd_out_of_syn
>
>
> Have you figured out on which one of the servers the data is correct? And
> is
> it always the same server? This assumes a primary/secondary setup.
> If you know on which server the data is correct then you know - IF it's a
> hardware problem - which server is at fault. If it's a software pro
oot cause of your problems I guess…
>
>
>
> Regards,
>
>
>
> Pascal.
>
>
>
> *De :* drbd-user-boun...@lists.linbit.com [mailto:
> drbd-user-boun...@lists.linbit.com] *De la part de* Stanislav
> German-Evtushenko
> *Envoyé :* lundi 27 janvier 2014 13:
On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hi,
>
> Just jumping in, unaware of the history of this thread...
>
> Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:
> >
> > On Thu, Apr 1
On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko <
ginerm...@gmail.com> wrote:
> No choice so far :)
> http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
>
> I don't think this is a kernel bug. Anyway would be nice if sombody
> can investigate and fix or at l
/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
>>> Note that your kernel (and hence kvm/virtio) can be considered rather old
>>> by now.
>> This is a stable RHEL 6 kernel at the moment.
>
> Exactly ;-)
>
> Same for Debian 6, which I no longer consider fit
> Note that your kernel (and hence kvm/virtio) can be considered rather old by
> now.
This is a stable RHEL 6 kernel at the moment.
On Thu, Apr 18, 2013 at 1:16 PM, Felix Frank wrote:
> On 04/18/2013 08:26 AM, Stanislav German-Evtushenko wrote:
>> If I change VIRTIO to IDE
Beginning is here: http://www.gossamer-threads.com/lists/drbd/users/25146
Hello everybody,
Finally I think I can reproduce the issue. When it happens:
Linux kernel: 2.6.32-19-pve (based on vzkernel-2.6.32-042stab075.2.src.rpm)
DRBD Version: 8.3.13
DRBD Mode: dual primary + LVM on top of DRBD
Vir
HDD firmwares updated, "Command timeout on physical disk" error seems
to be gone but out of sync blocks still appear.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Hello all,
I have new information.
***
1. I found in logs on host1 this:
...
Apr 6 19:14:36 host1 Server Administrator: Storage Service EventID:
2405 Command timeout on physical disk:
Hello everybody,
I have disabled barriers (no-disk-barrier) on both nodes but it didn't
help. I ran "verify" daily and two of checks went well but third one
showed several blocks out of sync.
I have already tried everything I could think about. Can anybody guide
me what can I do next?
Best regard
Hi All,
No changes so far, disabling bonding didn't help. I still get out of sync
blocks from time to time.
So I have two questions:
- Can verifying process itself be the cause of "out of sync" blocks?
- Does it make any sense to disable disk-barriers and use flush instead if
we use RHEL kernel 2.
Hello,
I do futher investigation.
1. All of hardware firmwares are up to date so far but nothing has changed.
All of tcp offload features are disabled for all of 4 ethernet controllers.
2. I have created a small script for comparing out-of-sync blocks:
Some new foundings. node1 has very old ethernet firmware, the plan is
to update and to follow checking process.
root@node1:~# ethtool -i eth1
driver: bnx2
version: 2.2.3f
firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8
bus-info: :01:00.1
root@node2:~# ethtool -i eth1
driver: bnx2
version: 2.2.3f
Update: today's check found 1 block from yesterday and 3 new blocks.
Source is again node1 and dest node2 but now It's a Windows VM.
And same thing again: zeros "not replicated". Non-zero part from node1
(I'm talking about 3 blocks, that means 24 sectors by 512 bytes each)
completely the same as on
I am still trying to catch the bug.
Since I have disabled tx and tx checksumming offload two checks went
well but the third one found one block out-of-sync again. Block
includes 8 sectors for 512 bytes each but in fact only three of are
different.
status of content
0 - same
1 - same
2 - same
3 -
Futher investigations: I have disabled rx and tx checksumming
offloading and proceeded two cycles of verifying process - no problems
at all. The only problem is that CPU usage is increased a lot.
ethtool --offloading eth1 rx off tx off and ethtool --offloading eth3
rx off tx off on both nodes
May
l I stopped all VMs on one node
made the resource on this node secondary, disconnected,
and connected again.
Best regards,
Stanislav
On Mon, Mar 25, 2013 at 5:07 PM, Lars Ellenberg
wrote:
> On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko wrote:
>> Futher investigations.
Thank you for suggestion.
I could investigate if this is a swap region but it wouldn't help because:
1) I can check it for Linux VMs but I can't do the same for Windows
ones (because swap file is in file system).
2) I can't do online migration even if only swap region is out of sync
because it wil
What do you mean?
On Mon, Mar 25, 2013 at 4:16 PM, Radu Radutiu wrote:
> Are the oos regions part of the swap partitions for the VMs?
>
> Radu
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
> filter = [ "a|drbd|", "r|.*|" ]
yes, something like this is in place
> write_cache_state = 0
I have write_cache_state = 1 (default) but this is only filtering
cache as I understand and /etc/lvm/cache is empty on both nodes
> Do you use some custom disk caching module like flashcache?
I'm not su
Hi,
> there is a certain risk of data not reaching the hard disk soundly.
It seems so but what is a way to catch the reason?
> What's your hard disk stack?
RAID 1+0 on both nodes.
> If you have the freedom, there are some things to try, e.g.
> - not use ethernet bonding
I can try but it will dec
Futher investigations...
First vefification went well but then strange things started to happen.
Full logs are here: http://pastebin.com/ntbQcaNz
Short description is bellow in historical order.
**
* Once again with the correct email subject, sorry for the duplicate *
> Stanislav, my system sends me an email when verify finds an out-of-sync
condition.
> You can use the same handler if you like.
> In my global, handlers section:
> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemaila
> Stanislav, my system sends me an email when verify finds an out-of-sync
condition. You can use the same handler if you like.
> In my global, handlers section:
> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemailaddress";
I did it some time ago, so I get the message each time it verifi
Dear all,
I'm trying to catch the issue with out-of-sync and I've stuck so far. Can
anybody give me a hint what can I check next?
Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1G
Dear all,
I'm trying to catch the issue with out-of-sync and I've stuck so far. Can
anybody give me a hint what can I check next?
Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1G
32 matches
Mail list logo