[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2013-07-06 Thread Adolfo Jayme Barrientos
** No longer affects: mdadm (Ubuntu Lucid) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/557429 Title: array with conflicting changes is assembled with data corruption/silent loss To manage

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-11-15 Thread ceg
Sharing a hotplug raid array to two sync two machines is a very nice use case iMac, thanks for sharing your experience. I did only intentionally segment an array prior to performing updates so far. A place where information experience with the topic is shared and your workarounds would fit in

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-11-15 Thread ceg
should be specified and added *as a new feature*. Since the current documentation and implementation do not define any behavior for this diverged RAID1 scenario You could build upon this thread: http://comments.gmane.org/gmane.linux.raid/27822 (but leave out the parts that was caused by naming

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-11-08 Thread iMac
I use RAID1 everywhere, and I have seen both loose SATA cables and BIOS'es that are not set with enough delay for drives to spin up both lead to degraded RAID1 scenarios, so I am worried about the overall impact of this bug. My current use case is not one of these, but might be one used by anyone

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-11-08 Thread Clint Byrum
Hi iMac. Thanks for sharing your use case. I think this is a race condition that has only come to light recently because the startup and volume management has basically caused the number of things happening to remain consistent and small enough where the event-count gets incremented equally on

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-10-22 Thread ceg
I don't think there is anything practical that could be changed in md or mdadm to make it possible to catch this behaviour and refuse the assemble the array... The original topic of the linux-raid discussion http://comments.gmane.org/gmane.linux.raid/27822 suggested the idea to detect

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-10-20 Thread Clint Byrum
So, its been a while since this issue resurfaced, but I feel it needs to be put to rest. Are we really sure we should fix this? http://marc.info/?l=linux-raidm=127068416016382w=2 I don't think there is anything practical that could be changed in md or mdadm to make it possible to catch this

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-28 Thread Steve Langasek
Documented at https://wiki.ubuntu.com/LucidLynx/ReleaseNotes#Use%20of%20degraded%20RAID%201%20array%20may%20cause%20data%20loss%20in%20exceptional%20cases: If each member of a RAID 1 array is separately brought up in degraded mode across subsequent runs of the array with no reassembly in between,

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-26 Thread ceg
If one port flakes on one boot, the other port flakes on the next, and both ports are available on the third, wouldn't that trigger this same bogus reassembly? That depends. On linux-raid list it was said it would only happen if the event count on both segements are eqal +/-1. That would be

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-25 Thread ceg
** Description changed: Re-attaching parts of an array that have been running degraded separately and contain the same amount and conflicting changes, results in the assembly of a corrupt array. Using the latest beta-2 server ISO and following

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-25 Thread Steve Langasek
Dustin, On Tue, Apr 20, 2010 at 03:33:15PM -, Dustin Kirkland wrote: I agree with Philip's assessment. While this is very easy to reproduce in a VM (by just removing/adding backing disk files), in practice and on real hardware, I think this is definitely less likely. When a real

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-25 Thread Phillip Susi
On Sat, 2010-04-24 at 22:20 +, Steve Langasek wrote: Have I misunderstood the nature of this bug, or couldn't it be triggered by a flaky SATA cable causing intermittent connections to the drives? If one port flakes on one boot, the other port flakes on the next, and both ports are

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-25 Thread ceg
** Description changed: - Re-attaching parts of an array that have been running degraded - separately and contain the same amount and conflicting - changes or use a write intent bitmap, results in the assembly of a corrupt array. + Re-attaching parts of an array that have been running degraded

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-24 Thread Andrea Grandi
Hi all, comparing these two changelog: Ubuntu 10.04 beta 2: http://www.ubuntu.com/testing/lucid/beta2 Ubuntu 10.04 RC: http://www.ubuntu.com/getubuntu/releasenotes/1004overview You have removed this bug from known issues: Activating a RAID 1 array in degraded mode is reported to lead to RAID

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-23 Thread ceg
I'd suggest to consider the following option about whether to assemble segments known to contain conflicting changes or not: AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS That's because it DOESN'T break hot-plugging. I have explained why. You have the right to think that, obviously we

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-23 Thread Phillip Susi
On 4/23/2010 6:52 AM, ceg wrote: I'd suggest to consider the following option about whether to assemble segments known to contain conflicting changes or not: AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS As I have said, if you want another component to automatically notice when one

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-23 Thread ceg
Even if you intentionally caused the divergence you don't want both disks to show up as the same volume when plugged in. Right, they'd need to show up under an additionally enumerated (or mangled) version name, if another segment (version) of the same array is allready running. For hot-plug

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-23 Thread Phillip Susi
I suppose that the rename could be only temporary while both disks are connected, if so configured. After some further testing, it seems that the bug in mdadm is a bit more general. In --incremental mode it goes ahead and adds removed disks to the array, so even if you explicitly --fail and

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-23 Thread ceg
In --incremental mode it goes ahead and adds removed disks to the array Yes it would be nice if the states would get sorted out a little better. Running an array degraded during boot would only have to mark missing disks as failed for example, just as if they had failed while the array was

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-22 Thread Phillip Susi
On 4/22/2010 5:08 AM, ceg wrote: Phillip, before suggesting something I try to think through the issue, and the same I try with feedback. But after several attempts to explain that changing metadata and removing the failed status (of allready running parts) in the superblocks of the

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-22 Thread ceg
Phillip, before suggesting something I try to think through the issue, and the same I try with feedback. But after several attempts to explain that changing metadata and removing the failed status (of allready running parts) in the superblocks of the conflicting parts that are plugged-in (but not

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-22 Thread ceg
whether some other component automatically invokes mdadm to move the second disk to a brand new array, or the admin has to by hand, I don't really care. You are probably not aware enough that all udev/hotplug magic for raid is within mdadm --incremental. I.e. in the future it will even set up

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-21 Thread Phillip Susi
I'm going to boil this down very simply to try and bring an end to this. If you wish to automatically split the second disk into a new array with a new uuid, you must update the metadata on that disk to indicate you have done so, and if you end up connecting that second disk in the future without

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Steve Langasek
As I do think we will want to fix this in SRU once a fix is available, un-wontfixing the lucid task. We definitely *don't* want to try to change this now before release, but we should fix it in Lucid. Jamie, was this regression first introduced in 9.10, or did it exist in previous releases as

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Steve Langasek
Here's candidate text for the release notes, taken from the beta2 tech overview: Activating a RAID 1 array in degraded mode may lead to RAID disks being reported as in sync when they are not, resulting in data loss. Since RAID 1 arrays will automatically be brought up in degraded mode when a

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Phillip Susi
I think that warning is a bit misleading/extreme. The damage only occurs if you bring up one disk degraded, *and* then the other disk degraded. In practice, this should never happen since usually someone would notice the degraded event and take action to restore the missing disk. The release

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Dustin Kirkland
I agree with Philip's assessment. While this is very easy to reproduce in a VM (by just removing/adding backing disk files), in practice and on real hardware, I think this is definitely less likely. When a real hardware disk fails, it should be removed from the system, and not come back until

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Jamie Strandboge
I also agree with Philip's assessment. When it hits, it is devastating, but it takes a very specific series of events to hit, and asking people to not upgrade as a result is too extreme. Philip, you mentioned to me that 9.10 was also affected-- what about earlier releases? -- array with

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Dustin Kirkland
Jamie- As for earlier releases, I haven't tested this, but having written the original logic in the mdadm's failure hooks in the initramfs, I can tell you that the code handling is present in: * 8.04 (via a point release/SRU) * 8.10 * 9.04 * 9.10 * 10.04 :-Dustin -- array with conflicting

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread Phillip Susi
I have not tested earlier than Karmic. -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
Dustin, I don't think this has anything to do with the failure hooks in this case. :) (Here it's mdadm that does not pick up on the conflict.) -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
** Description changed: - Using the latest beta-2 server ISO and following - http://testcases.qa.ubuntu.com/Install/ServerRAID1 + Re-attaching parts of an array that have been running degraded + separately and contain the same amount and conflicting + changes, results in the assembly of a corrupt

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
Phillip, first please explain where/why you think flip-floping would occur continiously, in such a way that makes it not enough to never assemble a corrupt array and notify someone to take care to reconcile the conflicting changes if desired. Because this seems to be the reason you want to break

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
If I plug in one disk and make some changes, then unplug it, plug in the other disk, and make some changes to it, What would be your use-case? I don't understand this question. The use case is described in the text you replied to. Please explain why someone would do that, a raid

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
[auto-removing] prevents you from continuing to flip flop which disk you are using after they have been forked, and thus making things worse. And, this is a program thinking it knows better than the user, mdadm --incremental should not do that. If you continue to do that after you have been

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-20 Thread ceg
the minimum action required to fix the bug is to simply reject the second disk, updating its metadata in the process. No, [updating metadata] really makes things worse! It prevents the user/admin from managing arrays (parts in this case) by simply plugging disks. No it does not.

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread ceg
Once the situation is detected, it needs to be noted Right, this is important especially in cases where segmentation has happened unintentionally. That is why I wanted mdadm to fail on conflicting changes without --force, not auto-sync and emit an event (email, beep, notification, whatever

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread Dustin Kirkland
Solving this bug will require a non-trivial overhaul of mdadm's failure hooks in the initramfs, and potentially new code in mdadm itself. In my opinion, this is not something that can be solved in Lucid by release. Also in my opinion, this is not a release critical issue, but rather should be

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread ceg
Additional thoughts why updating metadata looks more limiting than beneficial to me: Unintentional (intermittent) failures of disks won't cause conflicting changes but auto re-sync events to appear. Segmenting an array into parts with conflicting changes requires repeated boots with separate

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread Dustin Kirkland
As a followup, if this were fixed cleanly, and in a backportable manner, this could be a reasonable candidate for an SRU. -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread Jamie Strandboge
Considering we are a little over a week away from release, Dustin's comment sounds reasonable. This bug existed in 9.10, and we should get it fixed, but rushing a fix before release could easily affect more RAID users than this bug would. Hopefully the solution will be contained enough to make it

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread Phillip Susi
On 4/19/2010 1:10 PM, ceg wrote: (Between reboots I haven't seen too random/changing device reordering anyway. Mostly the enumeration seems to stay the same if nothing is rewired. I'd consider the hot-plugging order much more arbitrary, and even less worth of committing to the meta-data.) You

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread ceg
All would be clearer if * mdadm -E would report missing instead of removed (which sounds like it really got mdadm --removed) There already exists a faulty state. It might be appropriate to use that This and the detection process sounds reasonable to me. I am not sure how much sense

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread ceg
A name for this might be safe segmentation. It prevents data-loss that could occur by syncing unreliable disks. -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a member of

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-19 Thread Phillip Susi
Updating the metadata is needed to prevent further flip-flopping. Once the situation is detected, it needs to be noted so that further reboots will not decide to use the other disk. Pick one, and stick with it until the admin sorts things out. -- array with conflicting changes is assembled

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-16 Thread Thierry Carrez
** Changed in: mdadm (Ubuntu Lucid) Assignee: (unassigned) = Dustin Kirkland (kirkland) ** Changed in: mdadm (Ubuntu Lucid) Milestone: ubuntu-10.04 = None -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-16 Thread Phillip Susi
On 04/15/2010 03:55 AM, ceg wrote: Upon disappearance, a real failure, mdadm --fail or running an array degraded: mdadm -E shows *missing* disks marked as removed. (What you probably referred to all the time.) Even though nobody actually issued mdadm --removed on them. (What I referred to.)

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-15 Thread ceg
I see that we were stumbling about confusing wording in mdadm. Upon disappearance, a real failure, mdadm --fail or running an array degraded: mdadm -E shows *missing* disks marked as removed. (What you probably referred to all the time.) Even though nobody actually issued mdadm --removed on

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
** Summary changed: - booting out of sync RAID1 array comes up as already in sync (data-corruption) + array with conflicting changes is assembled with data corruption/silent loss -- array with conflicting changes is assembled with data corruption/silent loss

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
** Also affects: mdadm Importance: Undecided Status: New -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. --

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
Though I can sure understand it would be easier if we could just dismiss this to be taken care of by users, data-loss/corruption will allways come back heavy on ubuntu/mdadm. With ubunu systems in particular, we can not assume there will always be an admin available. And if there is an admin,

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread Phillip Susi
On 4/14/2010 11:58 AM, ceg wrote: Though I can sure understand it would be easier if we could just dismiss this to be taken care of by users, data-loss/corruption will allways come back heavy on ubuntu/mdadm. Not necessarily. Data loss because of automatic hardware detection and activation is

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
Ok... how does that alter the fact that we should not be automatically adding devices to arrays that have been explicitly removed? Not at all, we agree that explicitly --remove(ing) a device is a good way to tell mdadm --incremental (its hotplug control mechanism) not to re-add automatically.

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
Always auto-removing as a means to drop auto-re-add features simply isn't an answer for conflict detection. -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a member of Ubuntu

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread ceg
Currently mdadm does not seem to distingush manual --removed status from the status a drive that is missing gets when the array is run degraded. Especially with write intent bitmaps regulary being used for faster syncing in hotplug setups, and mdadm only comparing if eventcount is in range of

Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

2010-04-14 Thread Phillip Susi
On 4/14/2010 3:18 PM, ceg wrote: If I read your proposal correctly, running an array degraded would always also remove the missing disk. That is exactly what happens. When you give the go ahead to degrade the array, you fail and remove the missing disk. This would imply to * break all the