Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-01-30 07:18, Austin S. Hemmelgarn wrote: > On 2017-01-28 04:17, Andrei Borzenkov wrote: >> 27.01.2017 23:03, Austin S. Hemmelgarn пишет: >>> On 2017-01-27 11:47, Hans Deragon wrote: >>>> On 2017-01-24 14:48, Adam Borowski wrote: >>>> >>>>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote: >>>>> >>>>>> If I remove 'ro' from the option, I cannot get the filesystem mounted >>>>>> because of the following error: BTRFS: missing devices(1) exceeds the >>>>>> limit(0), writeable mount is not allowed So I am stuck. I can only >>>>>> mount the filesystem as read-only, which prevents me to add a disk. >>>>> >>>>> A known problem: you get only one shot at fixing the filesystem, but >>>>> that's >>>>> not because of some damage but because the check whether the fs is >>>>> in a >>>>> shape is good enough to mount is oversimplistic. >>>>> >>>>> Here's a patch, if you apply it and recompile, you'll be able to mount >>>>> degraded rw. >>>>> >>>>> Note that it removes a safety harness: here, the harness got tangled >>>>> up and >>>>> keeps you from recovering when it shouldn't, but it _has_ valid uses >>>>> that. >>>>> >>>>> Meow! >>>> >>>> Greetings, >>>> >>>> Ok, that solution will solve my problem in the short run, i.e. getting >>>> my raid1 up again. >>>> >>>> However, as a user, I am seeking for an easy, no maintenance raid >>>> solution. I wish that if a drive fails, the btrfs filesystem still >>>> mounts rw and leaves the OS running, but warns the user of the failing >>>> disk and easily allow the addition of a new drive to reintroduce >>>> redundancy. Are there any plans within the btrfs community to >>>> implement >>>> such a feature? In a year from now, when the other drive will fail, >>>> will I hit again this problem, i.e. my OS failing to start, booting >>>> into >>>> a terminal, and cannot reintroduce a new drive without recompiling the >>>> kernel? >>> Before I make any suggestions regarding this, I should point out that >>> mounting read-write when a device is missing is what caused this issue >>> in the first place. >> >> >> How do you replace device when filesystem is mounted read-only? >> > I'm saying that the use case you're asking to have supported is the > reason stuff like this happens. If you're mounting read-write degraded > and fixing the filesystem _immediately_ then it's not an issue, that's > exactly what read-write degraded mounts are for. If you're mounting > read-write degraded and then having the system run as if nothing was > wrong, then I have zero sympathy because that's _dangerous_, even with > LVM, MD-RAID, or even hardware RAID (actually, especially with hardware > RAID, LVM and MD are smart enough to automatically re-sync, most > hardware RAID controllers aren't). > > That said, as I mentioned further down in my initial reply, you > absolutely should be monitoring the filesystem and not letting things > get this bad if at all possible. It's actually very rare that a storage > device fails catastrophically with no warning (at least, on the scale > that most end users are operating). At a minimum, even if you're using > ext4 on top of LVM, you should be monitoring SMART attributes on the > storage devices (or whatever the SCSI equivalent is if you use > SCSI/SAS/FC devices). While not 100% reliable (they are getting better > though), they're generally a pretty good way to tell if a disk is likely > to fail in the near future. Greetings, I totally understand your concerns. However, anybody using raid is a grown up and though for them if they do not understand this. But the current scenario makes it difficult for me to put redundancy back into service! How much time did I waited until I find the mailing list, subscribe to it, post my email and get an answer? Wouldn't it be better if the user could actually add the disk at anytime, mostly ASAP? And to fix this, I have to learn how to patch and compile the kernel. I have not done this since the beginning of the century. More delays, more risk added to the system (what if I compile the kernel with the wrong parameters?). Fortunately, my raid1 system is for my home system and I do not need that data available right now. The data is safe, but I have no time to fiddle with this issue and put the raid1 in servic
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
Greetings, On 2017-02-02 10:06, Austin S. Hemmelgarn wrote: > On 2017-02-02 09:25, Adam Borowski wrote: >> On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote: >>> This is a severe bug that makes a not all that uncommon (albeit bad) use >>> case fail completely. The fix had no dependencies itself and >> >> I don't see what's bad in mounting a RAID degraded. Yeah, it provides no >> redundancy but that's no worse than using a single disk from the start. >> And most people not doing storage/server farm don't have a stack of spare >> disks at hand, so getting a replacement might take a while. > Running degraded is bad. Period. If you don't have a disk on hand to > replace the failed one (and if you care about redundancy, you should > have at least one spare on hand), you should be converting to a single > disk, not continuing to run in degraded mode until you get a new disk. > The moment you start talking about running degraded long enough that you > will be _booting_ the system with the array degraded, you need to be > converting to a single disk. This is of course impractical for > something like a hardware array or an LVM volume, but it's _trivial_ > with BTRFS, and protects you from all kinds of bad situations that can't > happen with a single disk but can completely destroy the filesystem if > it's a degraded array. Running a single disk is not exactly the same as > running a degraded array, it's actually marginally safer (even if you > aren't using dup profile for metadata) because there are fewer moving > parts to go wrong. It's also exponentially more efficient. >> >> Being able to continue to run when a disk fails is the whole point of >> RAID >> -- despite what some folks think, RAIDs are not for backups but for >> uptime. >> And if your uptime goes to hell because the moment a disk fails you >> need to >> drop everything and replace the disk immediately, why would you use RAID? > Because just replacing a disk and rebuilding the array is almost always > much cheaper in terms of time than rebuilding the system from a backup. > IOW, even if you have to drop everything and replace the disk > immediately, it's still less time consuming than restoring from a > backup. It also has the advantage that you don't lose any data. We disagree on letting people run degraded, which I support, you not. I respect your opinion. However, I have to ask who decides these rules? Obviously, not me since I am a simple btrfs home user. Since Oracle is funding btrfs development, is that Oracle's official stand on how to handle a failed disk? Who decides of btrfs's roadmap? I have no clue who is who on this mailing list and who influences the features of btrfs. Oracle is obviously using raid systems internally. How do the operators of these raid systems feel about this "not let the system run in degraded mode"? As a home user, I do not want to have a disk always available. This is paying a disk very expensively when the raid system can run easily for two years without disk failure. I want to buy the new disk (asap, of course) once one died. At that moment, the cost of a drive would have fallen drastically. Yes, I can live with running my home system (which has backups) for a day or two, in degraded rw mode until I purchase and can install a new disk. Chances are low that both disks will quit at around the same time. Simply because I cannot run in degraded mode and cannot add a disk to my current degraded raid1, despite having my replacement disk in my hands, I must resort to switch to mdadm or zfs. Having a policy that limits user's options for the sake that they are too stupid to understand the implications is wrong. Its ok for applications, but not at the operating system; there should be a way to force this. A --yes-i-know-what-i-am-doing-now-please-mount-rw-degraded-so-i-can-install-the-new-disk parameter must be implemented. Currently, it is like disallowing root to run mkfs over an existing filesystem because people could erase data by mistake. Let people do what they want and let them live with the consequences. hdparm has a --yes-i-know-what-i-am-doing flag. btrfs needs one. Whoever decides about btrfs features to add, please consider this one. Best regards, Hans Deragon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1: cannot add disk to replace faulty because can only mount fs as read-only.
Greetings, Warning: Btrfs user here; no knowledge of the inside working of btrfs. If I am in the wrong mailing list, please redirect me and accept my apologies. At home, lacking of disks and free SATA ports, I created a raid1 btrfs filesystem by converting an existing single btrfs instance into a degraded raid1, then added the other driver. The exact commands I used have been lost. Worked well, until one of my drive died. Total death; the OS does not detect it anymore. I bought another drive, but alas, I cannot add it: # btrfs replace start -B 2 /dev/sdd /mnt/brtfs-raid1-b ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/brtfs-raid1-b": Read-only file system Here is the command I used to mount it: mount -t btrfs -o ro,degraded,recovery,nosuid,nodev,nofail,x-gvfs-show /dev/disk/by-uuid/975bdbb3-9a9c-4a72-ad67-6cda545fda5e /mnt/brtfs-raid1-b If I remove 'ro' from the option, I cannot get the filesystem mounted because of the following error: BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed So I am stuck. I can only mount the filesystem as read-only, which prevents me to add a disk. It seams related to bug: https://bugzilla.kernel.org/show_bug.cgi?id=60594 I am using Ubuntu 16.04 LTS with kernel 4.4.0-59-generic. Is there any hope to add a disk? Else, can I recreate a raid1 with only one disk and add another, but never suffer from the same problem again? I did not lost any data, but I do have some serious downtime because of this. I wish that if a drive fail, the btrfs filesystem still mounts rw and leave the OS running, but warns the user of a failing disk and easily allow the addition of a new drive to reintroduce redundancy. However, this scenarios seams impossible with the current state of affair. Am I right? Best regards and thank you for your contribution to the open source movement, Hans Deragon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-01-24 14:48, Adam Borowski wrote: On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote: If I remove 'ro' from the option, I cannot get the filesystem mounted because of the following error: BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed So I am stuck. I can only mount the filesystem as read-only, which prevents me to add a disk. A known problem: you get only one shot at fixing the filesystem, but that's not because of some damage but because the check whether the fs is in a shape is good enough to mount is oversimplistic. Here's a patch, if you apply it and recompile, you'll be able to mount degraded rw. Note that it removes a safety harness: here, the harness got tangled up and keeps you from recovering when it shouldn't, but it _has_ valid uses that. Meow! Greetings, Ok, that solution will solve my problem in the short run, i.e. getting my raid1 up again. However, as a user, I am seeking for an easy, no maintenance raid solution. I wish that if a drive fails, the btrfs filesystem still mounts rw and leaves the OS running, but warns the user of the failing disk and easily allow the addition of a new drive to reintroduce redundancy. Are there any plans within the btrfs community to implement such a feature? In a year from now, when the other drive will fail, will I hit again this problem, i.e. my OS failing to start, booting into a terminal, and cannot reintroduce a new drive without recompiling the kernel? Best regards, Hans Deragon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html