** No longer affects: mdadm (Ubuntu Lucid)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/557429
Title:
array with conflicting changes is assembled with data
corruption/silent loss
To manage
Sharing a hotplug raid array to two sync two machines is a very nice use
case iMac, thanks for sharing your experience. I did only intentionally
segment an array prior to performing updates so far.
A place where information experience with the topic is shared and your
workarounds would fit in
should be specified and added *as a new feature*. Since the current
documentation and implementation do not define any behavior for this
diverged RAID1 scenario
You could build upon this thread:
http://comments.gmane.org/gmane.linux.raid/27822
(but leave out the parts that was caused by naming
I use RAID1 everywhere, and I have seen both loose SATA cables and
BIOS'es that are not set with enough delay for drives to spin up both
lead to degraded RAID1 scenarios, so I am worried about the overall
impact of this bug. My current use case is not one of these, but might
be one used by anyone
Hi iMac. Thanks for sharing your use case.
I think this is a race condition that has only come to light recently
because the startup and volume management has basically caused the
number of things happening to remain consistent and small enough where
the event-count gets incremented equally on
I don't think there is anything practical that could be changed in md or
mdadm to make it possible to catch this behaviour and refuse the assemble the
array...
The original topic of the linux-raid discussion
http://comments.gmane.org/gmane.linux.raid/27822 suggested the idea to
detect
So, its been a while since this issue resurfaced, but I feel it needs to
be put to rest.
Are we really sure we should fix this?
http://marc.info/?l=linux-raidm=127068416016382w=2
I don't think there is anything practical that could be changed in md or
mdadm to make it possible to catch this
Documented at
https://wiki.ubuntu.com/LucidLynx/ReleaseNotes#Use%20of%20degraded%20RAID%201%20array%20may%20cause%20data%20loss%20in%20exceptional%20cases:
If each member of a RAID 1 array is separately brought up in degraded
mode across subsequent runs of the array with no reassembly in between,
If one
port flakes on one boot, the other port flakes on the next, and both ports
are available on the third, wouldn't that trigger this same bogus
reassembly?
That depends.
On linux-raid list it was said it would only happen if the event count
on both segements are eqal +/-1. That would be
** Description changed:
Re-attaching parts of an array that have been running degraded
separately and contain the same amount and conflicting
changes, results in the assembly of a corrupt array.
Using the latest beta-2 server ISO and following
Dustin,
On Tue, Apr 20, 2010 at 03:33:15PM -, Dustin Kirkland wrote:
I agree with Philip's assessment.
While this is very easy to reproduce in a VM (by just removing/adding
backing disk files), in practice and on real hardware, I think this is
definitely less likely.
When a real
On Sat, 2010-04-24 at 22:20 +, Steve Langasek wrote:
Have I misunderstood the nature of this bug, or couldn't it be triggered by
a flaky SATA cable causing intermittent connections to the drives? If one
port flakes on one boot, the other port flakes on the next, and both ports
are
** Description changed:
- Re-attaching parts of an array that have been running degraded
- separately and contain the same amount and conflicting
- changes or use a write intent bitmap, results in the assembly of a corrupt
array.
+ Re-attaching parts of an array that have been running degraded
Hi all,
comparing these two changelog:
Ubuntu 10.04 beta 2: http://www.ubuntu.com/testing/lucid/beta2
Ubuntu 10.04 RC: http://www.ubuntu.com/getubuntu/releasenotes/1004overview
You have removed this bug from known issues:
Activating a RAID 1 array in degraded mode is reported to lead to RAID
I'd suggest to consider the following option about whether to assemble
segments known to contain conflicting changes or not:
AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS
That's because it DOESN'T break hot-plugging. I have explained why.
You have the right to think that, obviously we
On 4/23/2010 6:52 AM, ceg wrote:
I'd suggest to consider the following option about whether to assemble
segments known to contain conflicting changes or not:
AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS
As I have said, if you want another component to automatically notice
when one
Even if you intentionally caused the divergence you don't want both
disks to show up as the same volume when plugged in.
Right, they'd need to show up under an additionally enumerated (or
mangled) version name, if another segment (version) of the same array
is allready running. For hot-plug
I suppose that the rename could be only temporary while both disks are
connected, if so configured.
After some further testing, it seems that the bug in mdadm is a bit more
general. In --incremental mode it goes ahead and adds removed disks to
the array, so even if you explicitly --fail and
In --incremental mode it goes ahead and adds removed disks to
the array
Yes it would be nice if the states would get sorted out a little better.
Running an array degraded during boot would only have to mark missing
disks as failed for example, just as if they had failed while the array
was
On 4/22/2010 5:08 AM, ceg wrote:
Phillip, before suggesting something I try to think through the issue,
and the same I try with feedback.
But after several attempts to explain that changing metadata and
removing the failed status (of allready running parts) in the
superblocks of the
Phillip, before suggesting something I try to think through the issue,
and the same I try with feedback.
But after several attempts to explain that changing metadata and
removing the failed status (of allready running parts) in the
superblocks of the conflicting parts that are plugged-in (but not
whether some other component automatically invokes mdadm
to move the second disk to a brand new array, or the admin has to by
hand, I don't really care.
You are probably not aware enough that all udev/hotplug magic for raid
is within mdadm --incremental. I.e. in the future it will even set
up
I'm going to boil this down very simply to try and bring an end to this.
If you wish to automatically split the second disk into a new array with
a new uuid, you must update the metadata on that disk to indicate you
have done so, and if you end up connecting that second disk in the
future without
As I do think we will want to fix this in SRU once a fix is available,
un-wontfixing the lucid task. We definitely *don't* want to try to
change this now before release, but we should fix it in Lucid.
Jamie, was this regression first introduced in 9.10, or did it exist in
previous releases as
Here's candidate text for the release notes, taken from the beta2 tech
overview:
Activating a RAID 1 array in degraded mode may lead to RAID disks being
reported as in sync when they are not, resulting in data loss. Since
RAID 1 arrays will automatically be brought up in degraded mode when a
I think that warning is a bit misleading/extreme. The damage only
occurs if you bring up one disk degraded, *and* then the other disk
degraded. In practice, this should never happen since usually someone
would notice the degraded event and take action to restore the missing
disk. The release
I agree with Philip's assessment.
While this is very easy to reproduce in a VM (by just removing/adding
backing disk files), in practice and on real hardware, I think this is
definitely less likely.
When a real hardware disk fails, it should be removed from the system,
and not come back until
I also agree with Philip's assessment. When it hits, it is devastating,
but it takes a very specific series of events to hit, and asking people
to not upgrade as a result is too extreme.
Philip, you mentioned to me that 9.10 was also affected-- what about
earlier releases?
--
array with
Jamie-
As for earlier releases, I haven't tested this, but having written the
original logic in the mdadm's failure hooks in the initramfs, I can
tell you that the code handling is present in:
* 8.04 (via a point release/SRU)
* 8.10
* 9.04
* 9.10
* 10.04
:-Dustin
--
array with conflicting
I have not tested earlier than Karmic.
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
Dustin, I don't think this has anything to do with the failure hooks in
this case. :) (Here it's mdadm that does not pick up on the conflict.)
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification
** Description changed:
- Using the latest beta-2 server ISO and following
- http://testcases.qa.ubuntu.com/Install/ServerRAID1
+ Re-attaching parts of an array that have been running degraded
+ separately and contain the same amount and conflicting
+ changes, results in the assembly of a corrupt
Phillip, first please explain where/why you think flip-floping would occur
continiously, in such a way that makes it not enough to never assemble
a corrupt array and notify someone to take care to reconcile the
conflicting changes if desired.
Because this seems to be the reason you want to break
If I plug in one disk and make some changes, then unplug it,
plug in the other disk, and make some changes to it,
What would be your use-case?
I don't understand this question. The use case is described in the
text you replied to.
Please explain why someone would do that, a raid
[auto-removing]
prevents you from continuing to flip flop which disk you are using
after they have been forked, and thus making things worse.
And, this is a program thinking it knows better than the user, mdadm
--incremental should not do that. If you continue to do that after you
have been
the minimum action required to fix the bug is to simply reject the
second disk, updating its metadata in the process.
No, [updating metadata] really makes things worse! It prevents the
user/admin from managing arrays (parts in this case) by simply
plugging disks.
No it does not.
Once
the situation is detected, it needs to be noted
Right, this is important especially in cases where segmentation has
happened unintentionally. That is why I wanted mdadm to fail on
conflicting changes without --force, not auto-sync and emit an event
(email, beep, notification, whatever
Solving this bug will require a non-trivial overhaul of mdadm's failure
hooks in the initramfs, and potentially new code in mdadm itself.
In my opinion, this is not something that can be solved in Lucid by
release. Also in my opinion, this is not a release critical issue, but
rather should be
Additional thoughts why updating metadata looks more limiting than
beneficial to me:
Unintentional (intermittent) failures of disks won't cause conflicting
changes but auto re-sync events to appear.
Segmenting an array into parts with conflicting changes requires
repeated boots with separate
As a followup, if this were fixed cleanly, and in a backportable manner,
this could be a reasonable candidate for an SRU.
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a
Considering we are a little over a week away from release, Dustin's
comment sounds reasonable. This bug existed in 9.10, and we should get
it fixed, but rushing a fix before release could easily affect more RAID
users than this bug would. Hopefully the solution will be contained
enough to make it
On 4/19/2010 1:10 PM, ceg wrote:
(Between reboots I haven't seen too random/changing device
reordering anyway. Mostly the enumeration seems to stay the same if
nothing is rewired. I'd consider the hot-plugging order much more
arbitrary, and even less worth of committing to the meta-data.)
You
All would be clearer if * mdadm -E would report missing instead of
removed (which sounds like it really got mdadm --removed)
There already exists a faulty state. It might be appropriate to use that
This and the detection process sounds reasonable to me.
I am not sure how much sense
A name for this might be safe segmentation. It prevents data-loss that
could occur by syncing unreliable disks.
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of
Updating the metadata is needed to prevent further flip-flopping. Once
the situation is detected, it needs to be noted so that further reboots
will not decide to use the other disk. Pick one, and stick with it
until the admin sorts things out.
--
array with conflicting changes is assembled
** Changed in: mdadm (Ubuntu Lucid)
Assignee: (unassigned) = Dustin Kirkland (kirkland)
** Changed in: mdadm (Ubuntu Lucid)
Milestone: ubuntu-10.04 = None
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received
On 04/15/2010 03:55 AM, ceg wrote:
Upon disappearance, a real failure, mdadm --fail or running an array
degraded: mdadm -E shows *missing* disks marked as removed. (What
you probably referred to all the time.) Even though nobody actually
issued mdadm --removed on them. (What I referred to.)
I see that we were stumbling about confusing wording in mdadm.
Upon disappearance, a real failure, mdadm --fail or running an array
degraded: mdadm -E shows *missing* disks marked as removed. (What you
probably referred to all the time.) Even though nobody actually issued
mdadm --removed on
** Summary changed:
- booting out of sync RAID1 array comes up as already in sync (data-corruption)
+ array with conflicting changes is assembled with data corruption/silent loss
--
array with conflicting changes is assembled with data corruption/silent loss
** Also affects: mdadm
Importance: Undecided
Status: New
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
Though I can sure understand it would be easier if we could just dismiss this
to be taken care of by users, data-loss/corruption will allways come back heavy
on ubuntu/mdadm.
With ubunu systems in particular, we can not assume there will always be
an admin available. And if there is an admin,
On 4/14/2010 11:58 AM, ceg wrote:
Though I can sure understand it would be easier if we could just
dismiss this to be taken care of by users, data-loss/corruption will
allways come back heavy on ubuntu/mdadm.
Not necessarily. Data loss because of automatic hardware detection and
activation is
Ok... how does that alter the fact that we should not be automatically
adding devices to arrays that have been explicitly removed?
Not at all, we agree that explicitly --remove(ing) a device is a good
way to tell mdadm --incremental (its hotplug control mechanism) not to
re-add automatically.
Always auto-removing as a means to drop auto-re-add features simply
isn't an answer for conflict detection.
--
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of Ubuntu
Currently mdadm does not seem to distingush manual --removed status from
the status a drive that is missing gets when the array is run degraded.
Especially with write intent bitmaps regulary being used for faster
syncing in hotplug setups, and mdadm only comparing if eventcount is in
range of
On 4/14/2010 3:18 PM, ceg wrote:
If I read your proposal correctly, running an array degraded would
always also remove the missing disk.
That is exactly what happens. When you give the go ahead to degrade the
array, you fail and remove the missing disk.
This would imply to * break all the
56 matches
Mail list logo