HDD firmwares updated, "Command timeout on physical disk" error seems
to be gone but out of sync blocks still appear.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Hello all,
I have new information.
***
1. I found in logs on host1 this:
...
Apr 6 19:14:36 host1 Server Administrator: Storage Service EventID:
2405 Command timeout on physical disk:
Hello everybody,
I have disabled barriers (no-disk-barrier) on both nodes but it didn't
help. I ran "verify" daily and two of checks went well but third one
showed several blocks out of sync.
I have already tried everything I could think about. Can anybody guide
me what can I do next?
Best regard
Hi All,
No changes so far, disabling bonding didn't help. I still get out of sync
blocks from time to time.
So I have two questions:
- Can verifying process itself be the cause of "out of sync" blocks?
- Does it make any sense to disable disk-barriers and use flush instead if
we use RHEL kernel 2.
Hello,
I do futher investigation.
1. All of hardware firmwares are up to date so far but nothing has changed.
All of tcp offload features are disabled for all of 4 ethernet controllers.
2. I have created a small script for comparing out-of-sync blocks:
Some new foundings. node1 has very old ethernet firmware, the plan is
to update and to follow checking process.
root@node1:~# ethtool -i eth1
driver: bnx2
version: 2.2.3f
firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8
bus-info: :01:00.1
root@node2:~# ethtool -i eth1
driver: bnx2
version: 2.2.3f
Update: today's check found 1 block from yesterday and 3 new blocks.
Source is again node1 and dest node2 but now It's a Windows VM.
And same thing again: zeros "not replicated". Non-zero part from node1
(I'm talking about 3 blocks, that means 24 sectors by 512 bytes each)
completely the same as on
I am still trying to catch the bug.
Since I have disabled tx and tx checksumming offload two checks went
well but the third one found one block out-of-sync again. Block
includes 8 sectors for 512 bytes each but in fact only three of are
different.
status of content
0 - same
1 - same
2 - same
3 -
Futher investigations: I have disabled rx and tx checksumming
offloading and proceeded two cycles of verifying process - no problems
at all. The only problem is that CPU usage is increased a lot.
ethtool --offloading eth1 rx off tx off and ethtool --offloading eth3
rx off tx off on both nodes
May
On 03/25/2013 03:50 PM, Dan Barker wrote:
> You don't need Dual-Primary to live migrate - you need shared storage. Two
> completely different concepts. Now, the shared storage can be based on a DRBD
> Primary, and you can have it fail over to a DRBD Secondary, but dual primary
> is not going to
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko
> Sent: Monday, March 25, 2013 10:13 AM
> To: Radu Radutiu
> Subject: Re: [DRBD-user] Uncatchable DRBD out-of-sync is
Information by that links is deeply advanced. It's difficult even to
understand how much it is related to my problem.
> So you *possibly* have ongoing data corruption
> caused by hardware, or layers above DRBD.
Any ideas how to investigate?
> Or you may just have "normal behaviour",
> and if DRBD
Thank you for suggestion.
I could investigate if this is a swap region but it wouldn't help because:
1) I can check it for Linux VMs but I can't do the same for Windows
ones (because swap file is in file system).
2) I can't do online migration even if only swap region is out of sync
because it wil
yself for this kind of SANs.
Regards,
Maurits
-Oorspronkelijk bericht-
Van: Stanislav German-Evtushenko [mailto:ginerm...@gmail.com]
Verzonden: maandag 25 maart 2013 12:52
Aan: Maurits van de Lande
CC: drbd-user@lists.linbit.com
Onderwerp: Re: [DRBD-user] Uncatchable DRBD out-of-s
On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko wrote:
> Futher investigations...
>
> First vefification went well but then strange things started to happen.
> Full logs are here: http://pastebin.com/ntbQcaNz
... "Digest mismatch, buffer modified by upper layers during write
What do you mean?
On Mon, Mar 25, 2013 at 4:16 PM, Radu Radutiu wrote:
> Are the oos regions part of the swap partitions for the VMs?
>
> Radu
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
> filter = [ "a|drbd|", "r|.*|" ]
yes, something like this is in place
> write_cache_state = 0
I have write_cache_state = 1 (default) but this is only filtering
cache as I understand and /etc/lvm/cache is empty on both nodes
> Do you use some custom disk caching module like flashcache?
I'm not su
ists.linbit.com>
Onderwerp: Re: [DRBD-user] Uncatchable DRBD out-of-sync issue
Hi,
> there is a certain risk of data not reaching the hard disk soundly.
It seems so but what is a way to catch the reason?
> What's your hard disk stack?
RAID 1+0 on both nodes.
> If you have the freed
Hi,
> there is a certain risk of data not reaching the hard disk soundly.
It seems so but what is a way to catch the reason?
> What's your hard disk stack?
RAID 1+0 on both nodes.
> If you have the freedom, there are some things to try, e.g.
> - not use ethernet bonding
I can try but it will dec
You forget to CC the list...
On 03/25/2013 10:22 AM, Stanislav German-Evtushenko wrote:
>> there is a certain risk of data not reaching the hard disk soundly.
> It seems so but what is a way to catch the reason?
Really not much you can do. Not sure if ZFS is viable and helpful.
>> What's your ha
Hi,
there is a certain risk of data not reaching the hard disk soundly. This
is electronics, bits can get flipped for the dumbest of reasons (yes,
space radiation is now more dangerous to your computer than 20 years
ago, for one. Or so I've heard.)
What's your hard disk stack?
ZFS is said to pro
Futher investigations...
First vefification went well but then strange things started to happen.
Full logs are here: http://pastebin.com/ntbQcaNz
Short description is bellow in historical order.
**
* Once again with the correct email subject, sorry for the duplicate *
> Stanislav, my system sends me an email when verify finds an out-of-sync
condition.
> You can use the same handler if you like.
> In my global, handlers section:
> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh myemaila
ct/connect the resource)?
Dan, in Atlanta
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Stanislav
German-Evtushenko
Sent: Sunday, March 24, 2013 7:00 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Uncatchable DRBD out-of-sync issue
Dear
Dear all,
I'm trying to catch the issue with out-of-sync and I've stuck so far. Can
anybody give me a hint what can I check next?
Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1G
Dear all,
I'm trying to catch the issue with out-of-sync and I've stuck so far. Can
anybody give me a hint what can I check next?
Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1G
26 matches
Mail list logo