Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hello Joe, Monday, February 23, 2009, 7:23:39 PM, you wrote: MJ> Mario Goebbels wrote: >> One thing I'd like to see is an _easy_ option to fall back onto older >> uberblocks when the zpool went belly up for a silly reason. Something >> that doesn't involve esoteric parameters supplied to zdb. MJ> Between uberblock updates, there may be many write operations to MJ> a data file, each requiring a copy on write operation. Some of MJ> those operations may reuse blocks that were metadata blocks MJ> pointed to by the previous uberblock. MJ> In which case the old uberblock points to a metadata tree full of garbage. MJ> Jeff, you must have some idea on how to overcome this in your bugfix, would you care to share? As was suggested on the list before ZFS could keep a list of freed blocks for last N txgs and if there are still other blocks to be used it would not allocated those from the last N transactions. -- Best regards, Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Mario Goebbels wrote: > One thing I'd like to see is an _easy_ option to fall back onto older > uberblocks when the zpool went belly up for a silly reason. Something > that doesn't involve esoteric parameters supplied to zdb. Between uberblock updates, there may be many write operations to a data file, each requiring a copy on write operation. Some of those operations may reuse blocks that were metadata blocks pointed to by the previous uberblock. In which case the old uberblock points to a metadata tree full of garbage. Jeff, you must have some idea on how to overcome this in your bugfix, would you care to share? --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 9:47 PM, Richard Elling wrote: > It has been my experience that USB sticks use FAT, which is an ancient > file system which contains few of the features you expect from modern > file systems. As such, it really doesn't do any write caching. Hence, it > seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, > nor many of the other, high performance file systems are used by default > for USB devices. Could it be that anyone not using FAT for USB devices > is straining against architectural limits? There are no archtiectural limits. USB sticks can be used with whatever you throw at them. On sticks I use to interchange data with Windows machines I have NTFS, on others differente filesystems: ZFS, ext4, btrfs, often encrypted on block level. USB sticks are generally very simple -- no discard commands and other fancy stuff, but overall they are block devices just like discs, arrays, SSDs... -- Tomasz Torcz xmpp: zdzich...@chrome.pl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hey guys, I'll let this die in a sec, but I just wanted to say that I've gone and read the on disk document again this morning, and to be honest Richard, without the description you just wrote, I really wouldn't have known that uberblocks are in a 128 entry circular queue that's 4x redundant. Please understand that I'm not asking for answers to these notes, this post is purely to illustrate to you ZFS guys that much as I appreciate having the ZFS docs available, they are very tough going for anybody who isn't a ZFS developer. I consider myself well above average in IT ability, and I've really spent quite a lot of time in the past year reading around ZFS, but even so I would definitely have come to the wrong conclusion regarding uberblocks. Richard's post I can understand really easily, but in the on disk format docs, that information is spread over 7 pages of really quite technical detail, and to be honest, for a user like myself raises as many questions as it answers: On page 6 I learn that labels are stored on each vdev, as well as each disk. So there will be a label on the pool, mirror (or raid group), and disk. I know the disk ones are at the start and end of the disk, and it sounds like the mirror vdev is in the same place, but where is the root vdev label? The example given doesn't mention its location at all. Then, on page 7 it sounds like the entire label is overwriten whenever on-disk data is updated - "any time on-disk data is overwritten, there is potential for error". To me, it sounds like it's not a 128 entry queue, but just a group of 4 labels, all of which are overwritten as data goes to disk. Then finally, on page 12 the uberblock is mentioned (although as an aside, the first time I read these docs I had no idea what the uberblock actually was). It does say that only one uberblock is active at a time, but with it being part of the label I'd just assume these were overwritten as a group.. And that's why I'll often throw ideas out - I can either rely on my own limited knowledge of ZFS to say if it will work, or I can take advantage of the excellent community we have here, and post the idea for all to see. It's a quick way for good ideas to be improved upon, and bad ideas consigned to the bin. I've done it before in my rather lengthly 'zfs availability' thread. My thoughts there were thrashed out nicely, with some quite superb additions (namely the concept of lop sided mirrors which I think are a great idea). Ross PS. I've also found why I thought you had to search for these blocks, it was after reading this thread where somebody used mdb to search a corrupt pool to try to recover data: http://opensolaris.org/jive/message.jspa?messageID=318009 On Fri, Feb 13, 2009 at 11:09 PM, Richard Elling wrote: > Tim wrote: >> >> >> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn >> mailto:bfrie...@simple.dallas.tx.us>> wrote: >> >>On Fri, 13 Feb 2009, Ross Smith wrote: >> >>However, I've just had another idea. Since the uberblocks are >>pretty >>vital in recovering a pool, and I believe it's a fair bit of >>work to >>search the disk to find them. Might it be a good idea to >>allow ZFS to >>store uberblock locations elsewhere for recovery purposes? >> >> >>Perhaps it is best to leave decisions on these issues to the ZFS >>designers who know how things work. >> >>Previous descriptions from people who do know how things work >>didn't make it sound very difficult to find the last 20 >>uberblocks. It sounded like they were at known points for any >>given pool. >> >>Those folks have surely tired of this discussion by now and are >>working on actual code rather than reading idle discussion between >>several people who don't know the details of how things work. >> >> >> >> People who "don't know how things work" often aren't tied down by the >> baggage of knowing how things work. Which leads to creative solutions those >> who are weighed down didn't think of. I don't think it hurts in the least >> to throw out some ideas. If they aren't valid, it's not hard to ignore them >> and move on. It surely isn't a waste of anyone's time to spend 5 minutes >> reading a response and weighing if the idea is valid or not. > > OTOH, anyone who followed this discussion the last few times, has looked > at the on-disk format documents, or reviewed the source code would know > that the uberblocks are kept in an 128-entry circular queue which is 4x > redundant with 2 copies each at the beginning and end of the vdev. > Other metadata, by default, is 2x redundant and spatially diverse. > > Clearly, the failure mode being hashed out here has resulted in the defeat > of those protections. The only real question is how fast Jeff can roll out > the > feature to allow reverting to previous uberblocks. The procedure for doing > this by hand has long been known, and was posted on this forum -- though > it is te
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Frank Cusack wrote: i'm sorry to berate you, as you do make very valuable contributions to the discussion here, but i take offense at your attempts to limit discussion simply because you know everything there is to know about the subject. The point is that those of us in the chattering class (i.e. people like you and me) clearly know very little about the subject, and continuting to chatter among ourselves is soon no longer rewarding. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hi Bob, On Fri, 13 Feb 2009 19:58:51 -0600 (CST) Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Tim wrote: > > > I don't think it hurts in the least to throw out some ideas. If > > they aren't valid, it's not hard to ignore them and move on. It > > surely isn't a waste of anyone's time to spend 5 minutes reading a > > response and weighing if the idea is valid or not. > > Today I sat down at 9:00 AM to read the new mail for the day and did > not catch up until five hours later. Quite a lot of the reading was > this (now) useless discussion thread. It is now useless since after > five hours of reading, there were no ideas expressed that had not > been expressed before. I've found this thread to be like watching a car accident, and also really frustrating due to the inability to use search engines on the part of many posters. > With this level of overhead, I am surprise that there is any > remaining development motion on ZFS at all. Good thing the ZFS developers have mail filters :-) cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 7:58:51 PM -0600 Bob Friesenhahn wrote: With this level of overhead, I am surprise that there is any remaining development motion on ZFS at all. come on now. with all due respect, you are attempting to stifle relevant discussion and that is, well, bordering on ridiculous. i sure have learned a lot from this thread. now of course that is meaningless because i don't and almost certainly never will contribute to zfs, but i assume there are others who have learned from this thread. that's definitely a good thing. this thread also appears to be the impetus to change priorities on zfs development. Today I sat down at 9:00 AM to read the new mail for the day and did not catch up until five hours later. Quite a lot of the reading was this (now) useless discussion thread. It is now useless since after five hours of reading, there were no ideas expressed that had not been expressed before. lastly, WOW! if this thread is worthless to you, learn to use the delete button. especially if you read that slowly. i know i certainly couldn't keep up with all my incoming mail if i read everything. i'm sorry to berate you, as you do make very valuable contributions to the discussion here, but i take offense at your attempts to limit discussion simply because you know everything there is to know about the subject. great, now i am guilty of being "overhead". -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Tim wrote: I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. Today I sat down at 9:00 AM to read the new mail for the day and did not catch up until five hours later. Quite a lot of the reading was this (now) useless discussion thread. It is now useless since after five hours of reading, there were no ideas expressed that had not been expressed before. With this level of overhead, I am surprise that there is any remaining development motion on ZFS at all. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Tim wrote: On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn mailto:bfrie...@simple.dallas.tx.us>> wrote: On Fri, 13 Feb 2009, Ross Smith wrote: However, I've just had another idea. Since the uberblocks are pretty vital in recovering a pool, and I believe it's a fair bit of work to search the disk to find them. Might it be a good idea to allow ZFS to store uberblock locations elsewhere for recovery purposes? Perhaps it is best to leave decisions on these issues to the ZFS designers who know how things work. Previous descriptions from people who do know how things work didn't make it sound very difficult to find the last 20 uberblocks. It sounded like they were at known points for any given pool. Those folks have surely tired of this discussion by now and are working on actual code rather than reading idle discussion between several people who don't know the details of how things work. People who "don't know how things work" often aren't tied down by the baggage of knowing how things work. Which leads to creative solutions those who are weighed down didn't think of. I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. OTOH, anyone who followed this discussion the last few times, has looked at the on-disk format documents, or reviewed the source code would know that the uberblocks are kept in an 128-entry circular queue which is 4x redundant with 2 copies each at the beginning and end of the vdev. Other metadata, by default, is 2x redundant and spatially diverse. Clearly, the failure mode being hashed out here has resulted in the defeat of those protections. The only real question is how fast Jeff can roll out the feature to allow reverting to previous uberblocks. The procedure for doing this by hand has long been known, and was posted on this forum -- though it is tedious. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Fri, 13 Feb 2009, Ross Smith wrote: > > However, I've just had another idea. Since the uberblocks are pretty >> vital in recovering a pool, and I believe it's a fair bit of work to >> search the disk to find them. Might it be a good idea to allow ZFS to >> store uberblock locations elsewhere for recovery purposes? >> > > Perhaps it is best to leave decisions on these issues to the ZFS designers > who know how things work. > > Previous descriptions from people who do know how things work didn't make > it sound very difficult to find the last 20 uberblocks. It sounded like > they were at known points for any given pool. > > Those folks have surely tired of this discussion by now and are working on > actual code rather than reading idle discussion between several people who > don't know the details of how things work. > People who "don't know how things work" often aren't tied down by the baggage of knowing how things work. Which leads to creative solutions those who are weighed down didn't think of. I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: However, I've just had another idea. Since the uberblocks are pretty vital in recovering a pool, and I believe it's a fair bit of work to search the disk to find them. Might it be a good idea to allow ZFS to store uberblock locations elsewhere for recovery purposes? Perhaps it is best to leave decisions on these issues to the ZFS designers who know how things work. Previous descriptions from people who do know how things work didn't make it sound very difficult to find the last 20 uberblocks. It sounded like they were at known points for any given pool. Those folks have surely tired of this discussion by now and are working on actual code rather than reading idle discussion between several people who don't know the details of how things work. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Richard Elling wrote: Greg Palmer wrote: Miles Nordin wrote: gm> That implies that ZFS will have to detect removable devices gm> and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. It has been my experience that USB sticks use FAT, which is an ancient file system which contains few of the features you expect from modern file systems. As such, it really doesn't do any write caching. Hence, it seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, nor many of the other, high performance file systems are used by default for USB devices. Could it be that anyone not using FAT for USB devices is straining against architectural limits? -- richard The default disabling of caching with Windows I mentioned is the same for either FAT or NTFS file systems. My personal guess would be that it's purely an effort to prevent software errors in the interface between the chair and keyboard. :-) I think a lot of users got trained in how to use a floppy disc and once they were trained, when they encountered the USB stick, they continued to treat it as an instance of the floppy class. This rubbed off on those around them. I can't tell you how many users have given me a blank stare and told me "But the light was out" when I saw them yank a USB stick out and mentioned it was a bad idea. Regards, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
You don't, but that's why I was wondering about time limits. You have to have a cut off somewhere, but if you're checking the last few minutes of uberblocks that really should cope with a lot. It seems like a simple enough thing to implement, and if a pool still gets corrupted with these checks in place, you can absolutely, positively blame it on the hardware. :D However, I've just had another idea. Since the uberblocks are pretty vital in recovering a pool, and I believe it's a fair bit of work to search the disk to find them. Might it be a good idea to allow ZFS to store uberblock locations elsewhere for recovery purposes? This could be as simple as a USB stick plugged into the server, a separate drive, or a network server. I guess even the ZIL device would work if it's separate hardware. But knowing the locations of the uberblocks would save yet more time should recovery be needed. On Fri, Feb 13, 2009 at 8:59 PM, Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Ross Smith wrote: > >> Thinking about this a bit more, you've given me an idea: Would it be >> worth ZFS occasionally reading previous uberblocks from the pool, just >> to check they are there and working ok? > > That sounds like a good idea. However, how do you know for sure that the > data returned is not returned from a volatile cache? If the hardware is > ignoring cache flush requests, then any data returned may be from a volatile > cache. > > Bob > == > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Richard Elling wrote: Greg Palmer wrote: Miles Nordin wrote: gm> That implies that ZFS will have to detect removable devices gm> and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. It has been my experience that USB sticks use FAT, which is an ancient file system which contains few of the features you expect from modern file systems. As such, it really doesn't do any write caching. Hence, it seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, nor many of the other, high performance file systems are used by default for USB devices. Could it be that anyone not using FAT for USB devices is straining against architectural limits? I'd follow that up by saying that those of us who do use something other that FAT with USB devices have a reasonable understanding of the limitations of those devices. Using ZFS is non-trivial from a typical user's perspective. The device has to be identified and the pool created. When a USB device is connected, the pool has to be manually imported before it can be used. Import/export could be fully integrated with gnome. Once that is in place, using a ZFS formatted USB stick should be just as "safe" as a FAT formatted one. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 02:00:28PM -0600, Nicolas Williams wrote: > Ordering matters for atomic operations, and filesystems are full of > those. Also, note that ignoring barriers is effectively as bad as dropping writes if there's any chance that some writes will never hit the disk because of, say, power failures. Imagine 100 txgs, but some writes from the first txg never hitting the disk because the drive keeps them in the cache without flushing them for too long, then you pull out the disk, or power fails -- in that case not even fallback to older txgs will help you, there'd be nothing that ZFS could do to help you. Of course, presumably even with most lousy drives you'd still have to be quite unlucky to lose writes written more than N txgs ago, for some value of N. But the point stands; what you lose will be a matter of chance (and it could well be whole datasets) given the kinds of devices we've been discussing. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? That sounds like a good idea. However, how do you know for sure that the data returned is not returned from a volatile cache? If the hardware is ignoring cache flush requests, then any data returned may be from a volatile cache. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: Also, that's a pretty extreme situation since you'd need a device that is being written to but not read from to fail in this exact way. It also needs to have no scrubbing being run, so the problem has remained undetected. On systems with a lot of RAM, 100% write is a pretty common situation since reads are often against data which are already cached in RAM. This is common when doing bulk data copies from one device to another (e.g. a backup from an "internal" pool to a USB-based pool) since the necessary filesystem information for the destination filesystem can be cached in memory for quick access rather than going to disk. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Ross Smith wrote: >> >> You have to consider that even with improperly working hardware, ZFS >> has been checksumming data, so if that hardware has been working for >> any length of time, you *know* that the data on it is good. > > You only know this if the data has previously been read. > > Assume that the device temporarily stops pysically writing, but otherwise > responds normally to ZFS. Then the device starts writing again (including a > recent uberblock), but with a large gap in the writes. Then the system > loses power, or crashes. What happens then? Hey Bob, Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? I wonder if you could do this after a few uberblocks have been written. It would seem to be a good way of catching devices that aren't writing correctly early on, as well as a way of guaranteeing that previous uberblocks are available to roll back to should a write go wrong. I wonder what the upper limits for this kind of write failure is going to be. I've seen 30 second delays mentioned in this thread. How often are uberblocks written? Is there any guarantee that we'll always have more than 30 seconds worth of uberblocks on a drive? Should ZFS be set so that it keeps either a given number of uberblocks, or 5 minutes worth of uberblocks, whichever is the larger? Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Greg Palmer wrote: Miles Nordin wrote: gm> That implies that ZFS will have to detect removable devices gm> and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. It has been my experience that USB sticks use FAT, which is an ancient file system which contains few of the features you expect from modern file systems. As such, it really doesn't do any write caching. Hence, it seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, nor many of the other, high performance file systems are used by default for USB devices. Could it be that anyone not using FAT for USB devices is straining against architectural limits? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Ross Smith wrote: >> >> You have to consider that even with improperly working hardware, ZFS >> has been checksumming data, so if that hardware has been working for >> any length of time, you *know* that the data on it is good. > > You only know this if the data has previously been read. > > Assume that the device temporarily stops pysically writing, but otherwise > responds normally to ZFS. Then the device starts writing again (including a > recent uberblock), but with a large gap in the writes. Then the system > loses power, or crashes. What happens then? Well in that case you're screwed, but if ZFS is known to handle even corrupted pools automatically, when that happens the immediate response on the forums is going to be "something really bad has happened to your hardware", followed by troubleshooting to find out what. Instead of the response now, where we all know there's every chance the data is ok, and just can't be gotten to without zdb. Also, that's a pretty extreme situation since you'd need a device that is being written to but not read from to fail in this exact way. It also needs to have no scrubbing being run, so the problem has remained undetected. However, even in that situation, if we assume that it happened and that these recovery tools are available, ZFS will either report that your pool is seriously corrupted, indicating a major hardware problem (and ZFS can now state this with some confidence), or ZFS will be able to open a previous uberblock, mount your pool and begin a scrub, at which point all your missing writes will be found too and reported. And then you can go back to your snapshots. :-D > > Bob > == > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. You only know this if the data has previously been read. Assume that the device temporarily stops pysically writing, but otherwise responds normally to ZFS. Then the device starts writing again (including a recent uberblock), but with a large gap in the writes. Then the system loses power, or crashes. What happens then? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Ross wrote: >> >> Something like that will have people praising ZFS' ability to >> safeguard their data, and the way it recovers even after system >> crashes or when hardware has gone wrong. You could even have a >> "common causes of this are..." message, or a link to an online help >> article if you wanted people to be really impressed. > > I see a career in politics for you. Barring an operating system > implementation bug, the type of problem you are talking about is due to > improperly working hardware. Irreversibly reverting to a previous > checkpoint may or may not obtain the correct data. Perhaps it will > produce a bunch of checksum errors. Actually that's a lot like FMA replies when it sees a problem, telling the person what happened and pointing them to a web page which can be updated with the newest information on the problem. That's a good spot for "This pool was not unmounted cleanly due to a hardware fault and data has been lost. The "" line contains the date which can be recovered to. Use the command # zfs reframbulocate -t to revert to --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 7:41 PM, Bob Friesenhahn wrote: > On Fri, 13 Feb 2009, Ross wrote: >> >> Something like that will have people praising ZFS' ability to safeguard >> their data, and the way it recovers even after system crashes or when >> hardware has gone wrong. You could even have a "common causes of this >> are..." message, or a link to an online help article if you wanted people to >> be really impressed. > > I see a career in politics for you. Barring an operating system > implementation bug, the type of problem you are talking about is due to > improperly working hardware. Irreversibly reverting to a previous > checkpoint may or may not obtain the correct data. Perhaps it will produce > a bunch of checksum errors. Yes, the root cause is improperly working hardware (or an OS bug like 6424510), but with ZFS being a copy on write system, when errors occur with a recent write, for the vast majority of the pools out there you still have huge amounts of data that is still perfectly valid and should be accessible. Unless I'm misunderstanding something, reverting to a previous checkpoint gets you back to a state where ZFS knows it's good (or at least where ZFS can verify whether it's good or not). You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. Yes, if you have databases or files there that were mid-write, they will almost certainly be corrupted. But at least your filesystem is back, and it's in as good a state as it's going to be given that in order for your pool to be in this position, your hardware went wrong mid-write. And as an added bonus, if you're using ZFS snapshots, now your pool is accessible, you have a bunch of backups available so you can probably roll corrupted files back to working versions. For me, that is about as good as you can get in terms of handling a sudden hardware failure. Everything that is known to be saved to disk is there, you can verify (with absolute certainty) whether data is ok or not, and you have backup copies of damaged files. In the old days you'd need to be reverting to tape backups for both of these, with potentially hours of downtime before you even know where you are. Achieving that in a few seconds (or minutes) is a massive step forwards. > There are already people praising ZFS' ability to safeguard their data, and > the way it recovers even after system crashes or when hardware has gone > wrong. Yes there are, but the majority of these are praising the ability of ZFS checksums to detect bad data, and to repair it when you have redundancy in your pool. I've not seen that many cases of people praising ZFS' recovery ability - uberblock problems seem to have a nasty habit of leaving you with tons of good, checksummed data on a pool that you can't get to, and while many hardware problems are dealt with, others can hang your entire pool. > > Bob > == > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote: > On February 13, 2009 1:10:55 PM -0500 Miles Nordin wrote: > >>"fc" == Frank Cusack writes: > > > >fc> If you're misordering writes > >fc> isn't that a completely different problem? > > > >no. ignoring the flush cache command causes writes to be misordered. > > oh. can you supply a reference or if you have the time, some more > explanation? (or can someone else confirm this.) Ordering matters for atomic operations, and filesystems are full of those. Now, if ordering is broken but the writes all eventually hit the disk then no one will notice. But if power failures and/or partitions (cables get pulled, network partitions occur affecting an iSCSI connection, ...) then bad things happen. For ZFS the easiest way to ameliorate this is the txg fallback fix that Jeff Bonwick has said is now a priority. And if ZFS guarantees no block re-use until N txgs pass after a block is freed, then the fallback can be of up to N txgs, which gives you a decent chance that you'll recover your pool in the face of buggy devices, but for each discarded txg you lose that transaction's writes, you lose data incrementally. (The larger N is the better your chance that the oldest of the last N txg's writes will all hit the disk in spite of the disk's lousy cache behaviors.) The next question is how to do the fallback, UI-wise. Should it ever be automatic? A pool option for that would be nice (I'd use it on all-USB pools). If/when not automatic, how should the user/admin be informed of the failure to open the pool and the option to fallback on an older txg (with data loss)? (For non-removable pools imported at boot time the answer is that the service will fail, causing sulogin to be invoked so you can fix the problem on console. For removable pools there should be a GUI.) Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross wrote: Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a "common causes of this are..." message, or a link to an online help article if you wanted people to be really impressed. I see a career in politics for you. Barring an operating system implementation bug, the type of problem you are talking about is due to improperly working hardware. Irreversibly reverting to a previous checkpoint may or may not obtain the correct data. Perhaps it will produce a bunch of checksum errors. There are already people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Superb news, thanks Jeff. Having that will really raise ZFS up a notch, and align it much better with peoples expectations. I assume it'll work via zpool import, and let the user know what's gone wrong? If you think back to this case, imagine how different the users response would have been if instead of being unable to mount the pool, ZFS had turned around and said: "This pool was not unmounted cleanly, and data has been lost. Do you want to restore your pool to the last viable state: (timestamp goes here)?" Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a "common causes of this are..." message, or a link to an online help article if you wanted people to be really impressed. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "fc" == Frank Cusack writes: fc> why would dropping a flush cache imply dropping every write fc> after the flush cache? it wouldn't and probably never does. It was an imaginary scenario invented to argue with you and to agree with the guy in the USB bug who said ``dropping a cache flush command is as bad as dropping a write.'' fc> oh. can you supply a reference or if you have the time, some fc> more explanation? (or can someone else confirm this.) I posted something long a few days ago that I need to revisit. The problem is, I don't actually understand how the disk commands work, so I was talking out my ass. Although I kept saying, ``I'm not sure it actually works this way,'' my saying so doesn't help anyone who spends the time to read it and then gets a bunch of mistaken garbage stuck in his head, which people who actually recognize as garbage are too busy to correct. It'd be better for everyone if I didn't do that. On the other hand, I think there's some worth to dreaming up several possibilities of what I fantisize the various commands might mean or do, rather than simply reading one of the specs to get the one right answer, because from what people in here say it soudns as though implementors of actual systems based on the SCSI commandset live in this same imaginary world of fantastic and multiple realities without any meaningful review or accountability that I do. (disks, bridges, iSCSI targets and initiators, VMWare/VBox storage, ...) pgpkzKNL1NfqX.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 10:29:05 AM -0800 Frank Cusack wrote: On February 13, 2009 1:10:55 PM -0500 Miles Nordin wrote: "fc" == Frank Cusack writes: fc> If you're misordering writes fc> isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. oh. can you supply a reference or if you have the time, some more explanation? (or can someone else confirm this.) uhh ... that question can be ignored as i answered it myself below. sorry if i'm must being noisy now. my understanding (weak, admittedly) is that drives will reorder writes on their own, and this is generally considered normal behavior. so to guarantee consistency *in the face of some kind of failure like a power loss*, we have write barriers. flush-cache is a stronger kind of write barrier. now that i think more, i suppose yes if you ignore the flush cache, then writes before and after the flush cache could be misordered, however it's the same as if there were no flush cache at all, and again as long as the drive has power and you can quiesce it then the data makes it to disk, and all is consistent and well. yes? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 1:10:55 PM -0500 Miles Nordin wrote: "fc" == Frank Cusack writes: fc> If you're misordering writes fc> isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. oh. can you supply a reference or if you have the time, some more explanation? (or can someone else confirm this.) my understanding (weak, admittedly) is that drives will reorder writes on their own, and this is generally considered normal behavior. so to guarantee consistency *in the face of some kind of failure like a power loss*, we have write barriers. flush-cache is a stronger kind of write barrier. now that i think more, i suppose yes if you ignore the flush cache, then writes before and after the flush cache could be misordered, however it's the same as if there were no flush cache at all, and again as long as the drive has power and you can quiesce it then the data makes it to disk, and all is consistent and well. yes? whereas if you drop a write, well it's gone off into a black hole. fc> Even then, I don't see how it's worse than DROPPING a write. fc> The data eventually gets to disk, and at that point in time, fc> the disk is consistent. When dropping a write, the data never fc> makes it to disk, ever. If you drop the flush cache command and every write after the flush cache command, yeah yeah it's bad, but in THAT case, the disk is still always consistent because no writes have been misordered. why would dropping a flush cache imply dropping every write after the flush cache? fc> In the face of a power loss, of course these result in the fc> same problem, no, it's completely different in a power loss, which is exactly the point. If you pull the cord while the disk is inconsistent, you may lose the entire pool. If the disk is never inconsistent because you've never misordered writes, you will only lose recent write activity. Losing everything you've ever written is usually much worse than losing what you've written recently. yeah, as soon as i wrote that i realized my error, so thank you and i agree on that point. *in the event of a power loss* being inconsistent is a worse problem. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "fc" == Frank Cusack writes: fc> If you're misordering writes fc> isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. fc> Even then, I don't see how it's worse than DROPPING a write. fc> The data eventually gets to disk, and at that point in time, fc> the disk is consistent. When dropping a write, the data never fc> makes it to disk, ever. If you drop the flush cache command and every write after the flush cache command, yeah yeah it's bad, but in THAT case, the disk is still always consistent because no writes have been misordered. fc> In the face of a power loss, of course these result in the fc> same problem, no, it's completely different in a power loss, which is exactly the point. If you pull the cord while the disk is inconsistent, you may lose the entire pool. If the disk is never inconsistent because you've never misordered writes, you will only lose recent write activity. Losing everything you've ever written is usually much worse than losing what you've written recently. yeah yeah some devil's advocate will toss in, ``i *need* some consistency promises or else it's better that the pool its hand and say `broken, restore backup please' even if the hand-raising comes in the form of losing the entire pool,'' well in that case neither one is acceptable. But if your requirements are looser, then dropping a flush cache command plus every write after the flush cache command is much better than just ignoring the flush cache command. of course, that is a weird kind of failure that never happens. I described it just to make a point, to argue against this overly-simple idea ``every write is precious. let's do them as soon as possible because there could be Valuable Business Data inside the writes! we don't want to lose anything Valuable!'' The part of SYNC CACHE that's causing people to lose entire pools isn't the ``hurry up! write faster!'' part of the command, such that without it you still get your precious writes, just a little slower. NO. It's the ``control the order of writes'' part that's important for integrity on a single-device vdev. pgpzrY74grvli.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009 17:53:00 +0100, Eric D. Mudama wrote: On Fri, Feb 13 at 9:14, Neil Perrin wrote: Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By "good" I mean hardware that correctly flush their write caches when requested. Can someone please name a specific piece of bad hardware? Or better still, name a few -GOOD- ones. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | SunOS sxce snv107++ + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:41:12 PM -0500 Miles Nordin wrote: "fc" == Frank Cusack writes: fc> if you have 100TB of data, wouldn't you have a completely fc> redundant storage network If you work for a ponderous leaf-eating brontosorous maybe. If your company is modern I think having such an oddly large amount of data in one pool means you'd more likely have 70 whitebox peecees using motherboard ethernet/sata only, connected to a mesh of unmanaged L2 switches (of some peculiar brand that happens to work well.) There will always be one or two peecees switched off, and constantly something will be resilvering. The home user case is not really just for home users. I think a lot of people are tired of paying quadruple for stuff that still breaks, even serious people. oh i dunno. i recently worked for a company that practically defines modern and we had multiples of 100TB of data. Like you said, not all in one place, but any given piece was fully redundant (well, if you count RAID-5 as "fully" ... but I'm really referring to the infrastructure). I can't imagine it any other way ... the cost of not having redundancy in the face of a failure is so much higher compared to the cost of building in that redundancy. Also I'm not sure how you get 1 pool with more than 1 peecee as zfs is not a cluster fs. So what you are talking about is multiple pools, and in that case if you do lose one (not redundant for whatever reason) you only have to restore a fraction of the 100TB from backup. fc> Isn't this easily worked around by having UPS power in fc> addition to whatever the data center supplies? In NYC over the last five years the power has been more reliable going into my UPS than coming out of it. The main reason for having a UPS is wiring maintenance. And the most important part of the UPS is the externally-mounted bypass switch because the UPS also needs maintenance. UPS has never _solved_ anything, it always just helps. so in the end we have to count on the software's graceful behavior, not on absolutes. I can't say I agree about the UPS, however I've already been pretty forthright that UPS, etc. isn't the answer to the problem, just a mitigating factor to the root problem. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:10:08 PM -0500 Miles Nordin wrote: please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:20:21 PM -0500 Miles Nordin wrote: "fc" == Frank Cusack writes: >> Dropping a flush-cache command is just as bad as dropping a >> write. fc> Not that it matters, but it seems obvious that this is wrong fc> or anyway an exaggeration. Dropping a flush-cache just means fc> that you have to wait until the device is quiesced before the fc> data is consistent. fc> Dropping a write is much much worse. backwards i think. Dropping a flush-cache is WORSE than dropping the flush-cache plus all writes after the flush-cache. The problem that causes loss of whole pools rather than loss of recently-written data isn't that you're writing too little. It's that you're dropping the barrier and misordering the writes. consequently you lose *everything you've ever written,* which is much worse than losing some recent writes, even a lot of them. Who said dropping a flush-cache means dropping any subsequent writes, or misordering writes? If you're misordering writes isn't that a completely different problem? Even then, I don't see how it's worse than DROPPING a write. The data eventually gets to disk, and at that point in time, the disk is consistent. When dropping a write, the data never makes it to disk, ever. In the face of a power loss, of course these result in the same problem, but even without a power loss the drop of a write is "catastrophic". -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Miles Nordin wrote: gm> That implies that ZFS will have to detect removable devices gm> and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. This is common practice for any sort of professional engineering effort. As an example, you aren't going to go out there and find yourself a chainsaw being sold new without a guard. It might be removable, but the default is to include it. Why? Well because there is a considerable chance of damage to the user without it. Likewise with a file system on a device which might cache a data write for as long as thirty seconds while being easily removable. In this case, the user may write the file and seconds later remove the device. Many folks out there behave in this manner. It really doesn't matter to them that they have a copy of the last save they did two hours ago, what they want and expect is that the most recent data they saved actually be on the USB stick for the to retrieve. What you are suggesting is that it is better to lose that data when it could have been avoided. I would personally suggest that it is better to have default behavior which is not surprising along with more advanced behavior for those who have bothered to read the manual. In Windows case, the write cache can be turned on, it is not "unchangeable" and those who have educated themselves use it. I seldom turn it on unless I'm doing heavy I/O to a USB hard drive, otherwise the performance difference is just not that great. Regards, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "fc" == Frank Cusack writes: fc> if you have 100TB of data, wouldn't you have a completely fc> redundant storage network If you work for a ponderous leaf-eating brontosorous maybe. If your company is modern I think having such an oddly large amount of data in one pool means you'd more likely have 70 whitebox peecees using motherboard ethernet/sata only, connected to a mesh of unmanaged L2 switches (of some peculiar brand that happens to work well.) There will always be one or two peecees switched off, and constantly something will be resilvering. The home user case is not really just for home users. I think a lot of people are tired of paying quadruple for stuff that still breaks, even serious people. fc> Isn't this easily worked around by having UPS power in fc> addition to whatever the data center supplies? In NYC over the last five years the power has been more reliable going into my UPS than coming out of it. The main reason for having a UPS is wiring maintenance. And the most important part of the UPS is the externally-mounted bypass switch because the UPS also needs maintenance. UPS has never _solved_ anything, it always just helps. so in the end we have to count on the software's graceful behavior, not on absolutes. pgpPp2ozffVKi.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "t" == Tim writes: t> I would like to believe it has more to do with Solaris's t> support of USB than ZFS, but the fact remains it's a pretty t> glaring deficiency in 2009, no matter which part of the stack t> is at fault. maybe, but for this job I don't much mind glaring deficiencies, as long as it's possible to assemble a working system without resorting to trial-and-error, and possible to know it's working before loading data on it. Right now, by following the ``best practices'', you don't know what to buy, and after you receive the hardware you don't know if it works until you lose a pool, at which time someone will tell you ``i guess it wasn't ever working.'' Even if you order sun4v or an expensive FC disk shelf, you still don't know if it works. (though, I'm starting to suspect, ni the case of FC or iSCSI the answer is always ``it does not work'') The only thing you know for sure is, if you lose a pool, someone will blame it on hardware bugs surroudning cache flushes, or else try to conflate the issue with a bunch of inapplicable garbage about checksums and wire corruption. This is unworkable. I'm not saying glaring 2009 deficiencies are irrelevant---on my laptop I do mind because I got out of a multi-year abusive relationship with NetBSD/hpcmips, and now want all parts of my laptop to have drivers. And I guess it applies to that neat timeslider / home-base--USB-disk case we were talking about a month ago. but for what I'm doing I will actually accept the advice ``do not ever put ZFS on USB because ZFS is a canary in the mine of USB bugs''---it's just, that advice is not really good enough to settle the whole issue. pgpFtPv2xfqGk.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "fc" == Frank Cusack writes: >> Dropping a flush-cache command is just as bad as dropping a >> write. fc> Not that it matters, but it seems obvious that this is wrong fc> or anyway an exaggeration. Dropping a flush-cache just means fc> that you have to wait until the device is quiesced before the fc> data is consistent. fc> Dropping a write is much much worse. backwards i think. Dropping a flush-cache is WORSE than dropping the flush-cache plus all writes after the flush-cache. The problem that causes loss of whole pools rather than loss of recently-written data isn't that you're writing too little. It's that you're dropping the barrier and misordering the writes. consequently you lose *everything you've ever written,* which is much worse than losing some recent writes, even a lot of them. pgp0bxNk2dBD0.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> "gm" == Gary Mills writes: gm> That implies that ZFS will have to detect removable devices gm> and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. As we've said many times, if the devices are working properly, then they can be unplugged uncleanly without corrupting the pool, and without corrupting any other non-Microsoft filesystem. This is an old, SOLVED, problem. It's ridiculous hypocricy to make whole filesystems DSYNC, to even _invent the possibility for the filesystem to be DSYNC_, just because it is possible to remove something. Will you do the same thing because it is possible for your laptop's battery to run out? just, STOP! If the devices are broken, the problem is that they're broken, not that they're removeable. personally, I think everything with a broken write cache should be black-listed in the kernel and attach read-only by default, whether it's a USB bridge or a SATA disk. This will not be perfect because USB bridges, RAID layers and iSCSI targets, will often hide the identity of the SATA drive behind them, and of course people will demand a way to disable it. but if you want to be ``safe'', then for the sake of making the point, THIS is the right way to do it, not muck around with these overloaded notions of ``removeable''. Also, the so-far unacknowledged ``iSCSI/FC Write Hole'' should be fixed so that a copy of all written data is held in the initiator's buffer cache until it's verified as *on the physical platter/NVRAM* so that it can be replayed if necessary, and SYNC CACHE commands are allowed to fail far enough that even *things which USE the initiator, like ZFS* will understand what it means when SYNC CACHE fails, and bounced connections are handled correctly---otherwise, when connections bounce or SYNC CACHE returns failure, correctness requires that the initiator pretend like its plug was pulled and panic. Short of that the initiator system must forcibly unmount all filesystems using that device and kill all processes that had files open on those filesystems. And sysadmins should have and know how to cleverly use a tool that tests for both functioning barriers and working SYNC CACHE, end-to-end. NO more ``removeable'' attributes, please! You are just pretending to solve a much bigger problem, and making things clumsy and disgusting in the process. pgpoCtG5UI9HX.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12 at 19:43, Toby Thain wrote: ^^ Spec compliance is what we're testing for... We wouldn't know if this special variant is working correctly either. :) Time the difference between NCQ reads with and without FUA in the presence of overlapped cached write data. That should have a significant performance penalty, compared to a device servicing the reads from a volatile buffer cache. FYI, there are semi-commonly-available power control units that take serial port or USB as an input, and have a whole bunch of SATA power connectors on them. These are the sorts of things that drive vendors use to bounce power unexpectedly in their testing, if you need to perform that same validation, it makes sense to invest in that bit of infrastructure. Something like this: http://www.ulinktech.com/products/hw_power_hub.html or just roll your own in a few days like this guy did for his printer: http://chezphil.org/slugpower/ It should be pretty trivial to perform a few thousand cached writes, issue a flush cache ext, and turn off power immediately after that command completes. Then go back and figure out how many of those writes were successfully written as the device claimed. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13 at 9:14, Neil Perrin wrote: Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By "good" I mean hardware that correctly flush their write caches when requested. Can someone please name a specific piece of bad hardware? --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By "good" I mean hardware that correctly flush their write caches when requested. Note, a pool is always consistent (again when using good hardware). The function of the intent log is not to provide consistency (like a journal), but to speed up synchronous requests like fsync and O_DSYNC. Neil. On 02/13/09 06:29, Jiawei Zhao wrote: While mobility could be lost, usb storage still has the advantage of being cheap and easy to install comparing to install internal disks on pc, so if I just want to use it to provide zfs storage space for home file server, can a small intent log located on internal sata disk prevent the pool corruption caused by a power cut? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/13/2009 5:58 AM, Ross wrote: huh? but that looses the convenience of USB. I've used USB drives without problems at all, just remember to "zpool export" them before you unplug. I think there is a subcommand of cfgaadm you should run to to notify Solariss that you intend to unplug the device. I don't use USB, and my familiarity with cfgadm (for FC and SCSI) is limited. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
While mobility could be lost, usb storage still has the advantage of being cheap and easy to install comparing to install internal disks on pc, so if I just want to use it to provide zfs storage space for home file server, can a small intent log located on internal sata disk prevent the pool corruption caused by a power cut? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
huh? but that looses the convenience of USB. I've used USB drives without problems at all, just remember to "zpool export" them before you unplug. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I am wondering if the usb storage device is not reliable for ZFS usage, can the situation be improved if I put the intent log on internal sata disk to avoid corruption and utilize the convenience of usb storage at the same time? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 12, 2009 1:44:34 PM -0800 bdebel...@intelesyscorp.com wrote: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 ... Dropping a flush-cache command is just as bad as dropping a write. Not that it matters, but it seems obvious that this is wrong or anyway an exaggeration. Dropping a flush-cache just means that you have to wait until the device is quiesced before the data is consistent. Dropping a write is much much worse. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
bcirvin, you proposed "something to allow us to try to pull data from a failed pool". Yes and no. 'Yes' as a pragmatic solution; 'no' for what ZFS was 'sold' to be: the last filesystem mankind would need. It was conceived as a filesystem that does not need recovery, due to its guaranteed consistent states on the/any drive - or better: at any moment. If this was truly the case, a recovery program was not needed, and I don't think SUN will like one neither. It also is more then suboptimal to prevent caching as proposed by others; this is but a very ugly hack. Again, and I have yet to receive comments on this, the original poster claimed to have done a proper flash/sync, and left a 100% consistent file system behind on his drive. At reboot, the pool, the higher entity, failed miserably. Of course, now one can conceive a program that scans the whole drive, like in the good ole days on ancient file systems to recover all those 100% correct file system(s). Or, one could - as proposed - add an Ãœberblock, like we had the FAT-mirror in the last millennium. The alternative, and engineering-wise much better solution, would be to diagnose the weakness on the contextual or semantical level: Where 100% consistent file systems cannot be communicated to by the operating system. This - so it seems - is (still) a shortcoming of the concept of ZFS. Which might be solved by means of yesterday, I agree. Or, by throwing more work into the level of the volume management, the pools. Without claiming to have the solution, conceptually I might want to propose to do away with the static, look-up-table-like structure of the pool, as stored in a mirror or Ãœberblock. Could it be feasible to associate pools dynamically? Could it be feasible, that the filesystems in a pool create a (new) handle once they are updated in a consistent manner? And when the drive is plugged/turned on, the software simply collects all the handles of all file systems on that drive? Then the export/import is possible, but not required any longer, since the filesystems form their own entities. They can still have associated contextual/semantic (stored) structures into which they are 'plugged' once the drive is up; if one wanted to ('logical volume'). But with or without, the pool would self-configure when the drive starts by picking up all file system handles. Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Blake, On Thu, Feb 12, 2009 at 05:35:14PM -0500, Blake wrote: > That does look like the issue being discussed. > > It's a little alarming that the bug was reported against snv54 and is > still not fixed :( Looks like the bug-report is out of sync. I see that the bug has been fixed in B54. Here is the link to source gate which shows that the fix is in the gate : http://src.opensolaris.org/source/search?q=&defs=&refs=&path=&hist=6424510&project=%2Fonnv And here are the diffs : http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c?r2=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403169&r1=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403138 Thanks and regards, Sanjeev. > > Does anyone know how to push for resolution on this? USB is pretty > common, like it or not for storage purposes - especially amongst the > laptop-using dev crowd that OpenSolaris apparently targets. > > > > On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com > wrote: > > Is this the crux of the problem? > > > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 > > > > 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. > > This can cause catastrophic data corruption in the event of power loss, > > even for filesystems like ZFS that are designed to survive it. > > Dropping a flush-cache command is just as bad as dropping a write. > > It violates the interface that software relies on to use the device.' > > -- > > This message posted from opensolaris.org > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sanjeev Bagewadi Solaris RPE Bangalore, India ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 12-Feb-09, at 7:02 PM, Eric D. Mudama wrote: On Thu, Feb 12 at 21:45, Mattias Pantzare wrote: A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. Not true with modern SATA drives that support NCQ, as there is a FUA bit that can be set by the driver on NCQ reads. If the device implements the spec, ^^ Spec compliance is what we're testing for... We wouldn't know if this special variant is working correctly either. :) --T any overlapped write cache data will be flushed, invalidated, and a fresh read done from the non-volatile media for the FUA read command. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Blake wrote: I'm sure it's very hard to write good error handling code for hardware events like this. I think, after skimming this thread (a pretty wild ride), we can at least decide that there is an RFE for a recovery tool for zfs - something to allow us to try to pull data from a failed pool. That seems like a reasonable tool to request/work on, no? The ability to force a roll back to an older uberblock in order to be able to access the pool (in the case of corrupt current uberblock) should be ZFS developer's very top priority, IMO. I'd offer to do it myself, but I have nowhere near the ability to do so. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12 at 21:45, Mattias Pantzare wrote: A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. Not true with modern SATA drives that support NCQ, as there is a FUA bit that can be set by the driver on NCQ reads. If the device implements the spec, any overlapped write cache data will be flushed, invalidated, and a fresh read done from the non-volatile media for the FUA read command. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I'm sure it's very hard to write good error handling code for hardware events like this. I think, after skimming this thread (a pretty wild ride), we can at least decide that there is an RFE for a recovery tool for zfs - something to allow us to try to pull data from a failed pool. That seems like a reasonable tool to request/work on, no? On Thu, Feb 12, 2009 at 6:03 PM, Toby Thain wrote: > > On 12-Feb-09, at 3:02 PM, Tim wrote: > > > On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet wrote: >> >> On Thu, February 12, 2009 10:10, Ross wrote: >> >> > Of course, that does assume that devices are being truthful when they >> > say >> > that data has been committed, but a little data loss from badly designed >> > hardware is I feel acceptable, so long as ZFS can have a go at >> > recovering >> > corrupted pools when it does happen, instead of giving up completely >> > like >> > it does now. >> >> Well; not "acceptable" as such. But I'd agree it's outside ZFS's purview. >> The blame for data lost due to hardware actively lying and not working to >> spec goes to the hardware vendor, not to ZFS. >> >> If ZFS could easily and reliably warn about such hardware I'd want it to, >> but the consensus seems to be that we don't have a reliable qualification >> procedure. In terms of upselling people to a Sun storage solution, having >> ZFS diagnose problems with their cheap hardware early is clearly desirable >> :-). >> > > > Right, well I can't imagine it's impossible to write a small app that can > test whether or not drives are honoring correctly by issuing a commit and > immediately reading back to see if it was indeed committed or not. > > You do realise that this is not as easy as it looks? :) For one thing, the > drive will simply serve the read from cache. > It's hard to imagine a test that doesn't involve literally pulling plugs; > even better, a purpose built hardware test harness. > Nonetheless I hope that someone comes up with a brilliant test. But if the > ZFS team hasn't found one yet... it looks grim :) > --Toby > > Like a "zfs test cXtX". Of course, then you can't just blame the hardware > everytime something in zfs breaks ;) > > --Tim > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 12-Feb-09, at 3:02 PM, Tim wrote: On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet wrote: On Thu, February 12, 2009 10:10, Ross wrote: > Of course, that does assume that devices are being truthful when they say > that data has been committed, but a little data loss from badly designed > hardware is I feel acceptable, so long as ZFS can have a go at recovering > corrupted pools when it does happen, instead of giving up completely like > it does now. Well; not "acceptable" as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. You do realise that this is not as easy as it looks? :) For one thing, the drive will simply serve the read from cache. It's hard to imagine a test that doesn't involve literally pulling plugs; even better, a purpose built hardware test harness. Nonetheless I hope that someone comes up with a brilliant test. But if the ZFS team hasn't found one yet... it looks grim :) --Toby Like a "zfs test cXtX". Of course, then you can't just blame the hardware everytime something in zfs breaks ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, 2009-02-12 at 17:35 -0500, Blake wrote: > That does look like the issue being discussed. > > It's a little alarming that the bug was reported against snv54 and is > still not fixed :( bugs.opensolaris.org's information about this bug is out of date. It was fixed in snv_54: changeset: 3169:1dea14abfe17 user:phitran date:Sat Nov 25 11:05:17 2006 -0800 files: usr/src/uts/common/io/scsi/targets/sd.c 6424510 usb ignores DKIOCFLUSHWRITECACHE - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I just tried putting a pool on a USB flash drive, writing a file to it, and then yanking it. I did not lose any data or the pool, but I had to reboot before I could get any zpool command to complete without freezing. I also had OS reboot once on its own, when I tried to issue a zpool command to the pool. OS did noticed the disk was yanked until i tried to status it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, February 12, 2009 14:02, Tim wrote: > > Right, well I can't imagine it's impossible to write a small app that can > test whether or not drives are honoring correctly by issuing a commit and > immediately reading back to see if it was indeed committed or not. Like a > "zfs test cXtX". Of course, then you can't just blame the hardware > everytime something in zfs breaks ;) > I can imagine it fairly easily. All you've got to work with is what the drive says about itself, and how fast, and the what we're trying to test is whether it lies. It's often very hard to catch it out on this sort of thing. We need somebody who really understands the command sets available to send to modern drives (which is not me) to provide a test they think would work, and people can argue or try it. My impression, though, is that the people with the expertise are so far consistently saying it's not possible. I think at this point somebody who thinks it's possible needs to do the work to at least propose a specific test, or else we have to give up on the idea. I'm still hoping for at least some kind of qualification procedure involving manual intervention (hence not something that could be embodied in a simple command you just typed), but we're not seeing even this so far. Of course, the other side of this is that, if people "know" that drives have these problems, there must in fact be some way to demonstrate it, or they wouldn't know. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
That does look like the issue being discussed. It's a little alarming that the bug was reported against snv54 and is still not fixed :( Does anyone know how to push for resolution on this? USB is pretty common, like it or not for storage purposes - especially amongst the laptop-using dev crowd that OpenSolaris apparently targets. On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com wrote: > Is this the crux of the problem? > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 > > 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. > This can cause catastrophic data corruption in the event of power loss, > even for filesystems like ZFS that are designed to survive it. > Dropping a flush-cache command is just as bad as dropping a write. > It violates the interface that software relies on to use the device.' > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Is this the crux of the problem? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. This can cause catastrophic data corruption in the event of power loss, even for filesystems like ZFS that are designed to survive it. Dropping a flush-cache command is just as bad as dropping a write. It violates the interface that software relies on to use the device.' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
That would be the ideal, but really I'd settle for just improved error handling and recovery for now. In the longer term, disabling write caching by default for USB or Firewire drives might be nice. On Thu, Feb 12, 2009 at 8:35 PM, Gary Mills wrote: > On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote: >> Ross wrote: >> >I can also state with confidence that very, very few of the 100 staff >> >working here will even be aware that it's possible to unmount a USB volume >> >in windows. They will all just pull the plug when their work is saved, >> >and since they all come to me when they have problems, I think I can >> >safely say that pulling USB devices really doesn't tend to corrupt >> >filesystems in Windows. Everybody I know just waits for the light on the >> >device to go out. >> > >> The key here is that Windows does not cache writes to the USB drive >> unless you go in and specifically enable them. It caches reads but not >> writes. If you enable them you will lose data if you pull the stick out >> before all the data is written. This is the type of safety measure that >> needs to be implemented in ZFS if it is to support the average user >> instead of just the IT professionals. > > That implies that ZFS will have to detect removable devices and treat > them differently than fixed devices. It might have to be an option > that can be enabled for higher performance with reduced data security. > > -- > -Gary Mills--Unix Support--U of M Academic Computing and Networking- > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> > Right, well I can't imagine it's impossible to write a small app that can > test whether or not drives are honoring correctly by issuing a commit and > immediately reading back to see if it was indeed committed or not. Like a > "zfs test cXtX". Of course, then you can't just blame the hardware > everytime something in zfs breaks ;) A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote: > Ross wrote: > >I can also state with confidence that very, very few of the 100 staff > >working here will even be aware that it's possible to unmount a USB volume > >in windows. They will all just pull the plug when their work is saved, > >and since they all come to me when they have problems, I think I can > >safely say that pulling USB devices really doesn't tend to corrupt > >filesystems in Windows. Everybody I know just waits for the light on the > >device to go out. > > > The key here is that Windows does not cache writes to the USB drive > unless you go in and specifically enable them. It caches reads but not > writes. If you enable them you will lose data if you pull the stick out > before all the data is written. This is the type of safety measure that > needs to be implemented in ZFS if it is to support the average user > instead of just the IT professionals. That implies that ZFS will have to detect removable devices and treat them differently than fixed devices. It might have to be an option that can be enabled for higher performance with reduced data security. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet wrote: > > On Thu, February 12, 2009 10:10, Ross wrote: > > > Of course, that does assume that devices are being truthful when they say > > that data has been committed, but a little data loss from badly designed > > hardware is I feel acceptable, so long as ZFS can have a go at recovering > > corrupted pools when it does happen, instead of giving up completely like > > it does now. > > Well; not "acceptable" as such. But I'd agree it's outside ZFS's purview. > The blame for data lost due to hardware actively lying and not working to > spec goes to the hardware vendor, not to ZFS. > > If ZFS could easily and reliably warn about such hardware I'd want it to, > but the consensus seems to be that we don't have a reliable qualification > procedure. In terms of upselling people to a Sun storage solution, having > ZFS diagnose problems with their cheap hardware early is clearly desirable > :-). > > Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. Like a "zfs test cXtX". Of course, then you can't just blame the hardware everytime something in zfs breaks ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, February 12, 2009 10:10, Ross wrote: > Of course, that does assume that devices are being truthful when they say > that data has been committed, but a little data loss from badly designed > hardware is I feel acceptable, so long as ZFS can have a go at recovering > corrupted pools when it does happen, instead of giving up completely like > it does now. Well; not "acceptable" as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Ross wrote: I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. The key here is that Windows does not cache writes to the USB drive unless you go in and specifically enable them. It caches reads but not writes. If you enable them you will lose data if you pull the stick out before all the data is written. This is the type of safety measure that needs to be implemented in ZFS if it is to support the average user instead of just the IT professionals. Regards, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hello Bob, Wednesday, February 11, 2009, 11:25:12 PM, you wrote: BF> I agree. ZFS apparently syncs uncommitted writes every 5 seconds. BF> If there has been no filesystem I/O (including read I/O due to atime) BF> for at least 10 seconds, and there has not been more data BF> burst-written into RAM than can be written to disk in 10 seconds, then BF> there should be nothing remaining to write. That's not entirely true. After recent changes writes could be delayed even up-to 30s by default. -- Best regards, Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> All that and yet the fact > remains: I've never "ejected" a USB > drive from OS X or Windows, I simply pull it and go, > and I've never once lost data, or had it become > unrecoverable or even corrupted. > And yes, I do keep checksums of all the data > sitting on them and periodically check it. Â So, > for all of your ranting and raving, the fact remains > even a *crappy* filesystem like fat32 manages to > handle a hot unplug without any prior notice without > going belly up. > --Tim Just wanted to chime in with my 2c here. I've also *never* unmounted a USB drive from windows, and have been using them regularly since memory sticks became available. So that's 2-3 years of experience and I've never lost work on a memory stick, nor had a file corrupted. I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. And while this isn't really what ZFS is designed to do, I do think it should be able to cope. First of all, some kind of ZFS recovery tools are needed. There's going to be an awful lot of good data on that disk, making all of that inaccessible just because the last write failed isn't really on. It's a copy on write filesystem, "zpool import" really should be able to take advantage of that for recovering pools! I don't know the technicalities of how it works on disk, but my feeling is that the last successful mount point should be saved, and the last few uberblocks should also be available, so barring complete hardware failure, some kind of pool should be available for mounting. Also, if a drive is removed while writes are pending, some kind of error or warning is needed, either in the console, or the GUI. It should be possible to prompt the user to re-insert the device so that the remaining writes can be completed. Recovering the pool in that situation should be easy - you can keep the location of the uberblock you're using in memory, and just re-write everything. Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Yes, these problems happen more often with consumer level hardware, but recovery tools like this are going to be very much appreciated by anybody who encounters problems like this on a server! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
after all statements read here I just want to highlight another issue regarding ZFS. It was here many times recommended to set copies=2. Installing Solaris 10 10/2008 or snv_107 you can choose either to use UFS or ZFS. If you choose ZFS by default, the rpool will be created by default with 'copies=1'. If someone does not mention this and you have a hanging system with no chance to access or to shutdown properly and you have no other chance than to press the power button of your notebook through the desk plate, couldn't it be that there happens the same with my external usb drive? This is the same sudden power off event what seems to damage my pool. And it would be a nice to have that ZFS could handle this. Another issue what I miss in this thread is, that ZFS is a layer on an EFI lable. What about that in case of a sudden power off event? Regards, Dave. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 18:16, Uwe Dippel wrote: > I need to disappoint you here, LED inactive for a few seconds is a very > bad indicator of pending writes. Used to experience this on a stick on > Ubuntu, which was silent until the 'umount' and then it started to write > for some 10 seconds. Yikes, that's bizarre. > On the other hand, you are spot-on w.r.t. 'umount'. Once the command is > through, there is no more write to be expected. And if there was, it would > be a serious bug. So this 'umount'ed system needs to be in perfectly > consistent states. (Which is why I wrote further up that the structure > above the file system, that is the pool, is probably the culprit for all > this misery.) Yeah, once it's unmounted it really REALLY should be in a consistent state. > [i]Conversely, anybody who is pulling disks / memory sticks off while IO > is > visibly incomplete really SHOULD expect to lose everything on them[/i] > I hope you don't mean this. Not in a filesystem much hyped and much > advanced. Of course, we expect corruption of all files whose 'write' has > been boldly interrupted. But I for one, expect the metadata of all other > files to be readily available. Kind of, at the next use, telling me:"You > idiot removed the plug last, while files were still in the process of > writing. Don't expect them to be available now. Here is the list of all > other files: [list of all files not being written then]" It's good to have hopes, certainly. I'm just kinda cynical. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
May I doubt that there are drives that don't 'sync'? That means you have a good chance of corrupted data at a normal 'reboot'; or just at a 'umount' (without considering ZFS here). May I doubt the marketing drab that you need to buy a USCSI or whatnot to have functional 'sync' at a shutdown or umount? There are millions if not billions of drives out there that come up with consistent data structures after a clean shutdown. This means that a proper 'umount' flushes everything on those drives, and we need not expect corrupted data, and no further writes. And that was the topic further up to which I tried to answer. As well as to the notion that a file system that encounters interrupted writes may well and legally be completely unreadable. That is what I refuted, nothing else. Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 11-Feb-09, at 9:30 PM, Uwe Dippel wrote: Toby, sad that you fall for the last resort of the marketing droids here. All manufactures (and there are only a few left) will sue the hell out of you if you state that their drives don't 'sync'. And each and every drive I have ever used did. So the talk about a distinct borderline between 'enterprise' and 'home' is just cheap and not sustainable. They have existed. This thread has shown a motive to verify COTS drives for this property, if the data is valuable. Also, if you were correct, and ZFS allowed for compromising the metadata of dormant files (folders) by writing metadata for other files (folders), we would not have advanced beyond FAT, and ZFS would be but a short episode in the history of file systems. Or am I the last to notice that atomic writes have been dropped? Especially with atomic writes you either have the last consistent state of the file structure, or the updated one. So what would be the meaning of 'always consistent on the drive' if metadata were allowed to hang in between; in an inconsistent state? You write "What is known, is the last checkpoint." Exactly, and here a contradiction shows: the last checkpoint of all untouched files (plus those read only) does contain exactly all untouched files. How could one allow to compromise the last checkpoint by writing a new one? ZFS claims that the last checkpoint (my term, sorry, not an official one) is fully consistent (metadata *and* data! Unlike other filesystems). Since consistency is achievable by thousands of other transactional systems I have no reason to doubt that it is achieved by ZFS. You are correct with "the feasible recovery mode is a partial". Though here we have heard some stories of total loss. Nobody has questioned that the recovery of an interrupted 'write' must necessarily be partial. What is questioned is the complete loss of semantics. Only an incomplete transaction would be lost, AIUI. That is the 'atomic' property of all journaled and transactional systems. (All of it, or none of it.) --Toby Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 18:25, Toby Thain wrote: > > Absolutely. You should never get "actual corruption" (inconsistency) > at any time *except* in the case Jeff Bonwick explained: i.e. faulty/ > misbehaving hardware! (That's one meaning of "always consistent on > disk".) > > I think this is well understood, is it not? Perhaps. I think the consensus seems to be settling down this direction (as I filter for reliability of people posting, not by raw count :-)). The shocker is how much hardware that doesn't behave to spec in this area seems to be out there -- or so people claim; the other problem is that we can't sort out which is which. > Write barriers are not a new concept, and nor is the necessity. For > example, they are a clearly described feature of DEC's MSCP > protocol*, long before ATA or SCSI - presumably so that transactional > systems could actually be built at all. Devices were held to a high > standard of conformance since DEC's customers (like Sun's) were > traditionally those whose data was of very high value. Storage > engineers across the industry were certainly implementing them long > before MSCP. > > --Toby > > > * - The related patent that I am looking at is #4,449,182, filed 5 > Oct, 1981. > "Interface between a pair of processors, such as host and peripheral- > controlling processors in data processing systems." While I was working for LCG in Marlboro, in fact. (Not on hardware, nowhere near that work.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 17:25, Bob Friesenhahn wrote: > Regardless, it seems that the ZFS problems with crummy hardware are > primarily due to the crummy hardware writting the data to the disk in > a different order than expected. ZFS expects that after a sync that > all pending writes are committed. Which is something Unix has been claiming (or pretending) to provide for some time now, yes. > The lesson is that unprofessional hardware may prove to be unreliable > for professional usage. Or any other usage. And the question is how can we tell them apart? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Toby, sad that you fall for the last resort of the marketing droids here. All manufactures (and there are only a few left) will sue the hell out of you if you state that their drives don't 'sync'. And each and every drive I have ever used did. So the talk about a distinct borderline between 'enterprise' and 'home' is just cheap and not sustainable. Also, if you were correct, and ZFS allowed for compromising the metadata of dormant files (folders) by writing metadata for other files (folders), we would not have advanced beyond FAT, and ZFS would be but a short episode in the history of file systems. Or am I the last to notice that atomic writes have been dropped? Especially with atomic writes you either have the last consistent state of the file structure, or the updated one. So what would be the meaning of 'always consistent on the drive' if metadata were allowed to hang in between; in an inconsistent state? You write "What is known, is the last checkpoint." Exactly, and here a contradiction shows: the last checkpoint of all untouched files (plus those read only) does contain exactly all untouched files. How could one allow to compromise the last checkpoint by writing a new one? You are correct with "the feasible recovery mode is a partial". Though here we have heard some stories of total loss. Nobody has questioned that the recovery of an interrupted 'write' must necessarily be partial. What is questioned is the complete loss of semantics. Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 11-Feb-09, at 7:16 PM, Uwe Dippel wrote: I need to disappoint you here, LED inactive for a few seconds is a very bad indicator of pending writes. Used to experience this on a stick on Ubuntu, which was silent until the 'umount' and then it started to write for some 10 seconds. On the other hand, you are spot-on w.r.t. 'umount'. Once the command is through, there is no more write to be expected. And if there was, it would be a serious bug. Yes; though at the risk of repetition - the bug here can be in the drive... So this 'umount'ed system needs to be in perfectly consistent states. (Which is why I wrote further up that the structure above the file system, that is the pool, is probably the culprit for all this misery.) [i]Conversely, anybody who is pulling disks / memory sticks off while IO is visibly incomplete really SHOULD expect to lose everything on them[/i] I hope you don't mean this. Not in a filesystem much hyped and much advanced. Of course, we expect corruption of all files whose 'write' has been boldly interrupted. But I for one, expect the metadata of all other files to be readily available. Kind of, at the next use, telling me:"You idiot removed the plug last, while files were still in the process of writing. Don't expect them to be available now. Here is the list of all other files: [list of all files not being written then]" That hope is a little naive. AIUI, it cannot be known, thanks to the many indeterminacies of the I/O path, which 'files' were partially written (since a whole slew of copy-on-writes to many objects could have been in flight, and absent a barrier it cannot be known post facto which succeeded). What is known, is the last checkpoint. Hence the feasible recovery mode is a partial, automatic rollback to a past consistent state. Somebody correct me if I am wrong. --Toby Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 11-Feb-09, at 5:52 PM, David Dyer-Bennet wrote: On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote: On Wed, 11 Feb 2009, Tim wrote: Right, except the OP stated he unmounted the filesystem in question, and it was the *ONLY* one on the drive, meaning there is absolutely 0 chance of their being pending writes. There's nothing to write to. This is an interesting assumption leading to a wrong conclusion. If the file is updated and the filesystem is "unmounted", it is still possible for there to be uncommitted data in the pool. ... As a practical matter, it seems unreasonable to me that there would be uncommitted data in the pool after some quite short period of time ... That is, if I plug in a memory stick with ZFS on it, read and write for a while, then when I'm done and IO appears to have quiesced, observe that the IO light on the drive is inactive for several seconds, I'd be kinda disappointed if I got actual corrution if I pulled it. Absolutely. You should never get "actual corruption" (inconsistency) at any time *except* in the case Jeff Bonwick explained: i.e. faulty/ misbehaving hardware! (That's one meaning of "always consistent on disk".) I think this is well understood, is it not? Write barriers are not a new concept, and nor is the necessity. For example, they are a clearly described feature of DEC's MSCP protocol*, long before ATA or SCSI - presumably so that transactional systems could actually be built at all. Devices were held to a high standard of conformance since DEC's customers (like Sun's) were traditionally those whose data was of very high value. Storage engineers across the industry were certainly implementing them long before MSCP. --Toby * - The related patent that I am looking at is #4,449,182, filed 5 Oct, 1981. "Interface between a pair of processors, such as host and peripheral- controlling processors in data processing systems." Also the MSCP document released with the UDA50 mass storage subsystem, dated April 1982: "4.5 Command Categories and Execution Order ... Sequential commands are those commands that, for the same unit, must be executed in precise order. ... All sequential commands for a particular unit that are received on the same connection must be executed in the exact order that the MSCP server receives them. The execution of a sequential command may not be interleaved with the execution of any other sequential or non-sequential commands for the same unit. Furthermore, any non-sequential commands received before and on the same connection as a particular sequential command must be completed before execution of that sequential command begins, and any non-sequential commands received after and on the same conection as a particular sequential command must not begin execution until after that sequential command is completed. Sequential commands are, in effect, a barrier than non-sequential commands cannot pass or penetrate. Non-sequential commands are those commands that controllers may re-order so as to optimize performance. Controllers may furthermore interleave the execution of several non-sequential commands among themselves, ..." Complaints about not being exported next time I tried to import it, sure. Maybe other complaints. I wouldn't do this deliberately (other than for testing). ... -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I need to disappoint you here, LED inactive for a few seconds is a very bad indicator of pending writes. Used to experience this on a stick on Ubuntu, which was silent until the 'umount' and then it started to write for some 10 seconds. On the other hand, you are spot-on w.r.t. 'umount'. Once the command is through, there is no more write to be expected. And if there was, it would be a serious bug. So this 'umount'ed system needs to be in perfectly consistent states. (Which is why I wrote further up that the structure above the file system, that is the pool, is probably the culprit for all this misery.) [i]Conversely, anybody who is pulling disks / memory sticks off while IO is visibly incomplete really SHOULD expect to lose everything on them[/i] I hope you don't mean this. Not in a filesystem much hyped and much advanced. Of course, we expect corruption of all files whose 'write' has been boldly interrupted. But I for one, expect the metadata of all other files to be readily available. Kind of, at the next use, telling me:"You idiot removed the plug last, while files were still in the process of writing. Don't expect them to be available now. Here is the list of all other files: [list of all files not being written then]" Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, 11 Feb 2009, David Dyer-Bennet wrote: As a practical matter, it seems unreasonable to me that there would be uncommitted data in the pool after some quite short period of time when there's no new IO activity to the pool (not just the filesystem). 5 or 10 seconds, maybe? (Possibly excepting if there was a HUGE spike of IO for a while just before this; there could be considerable stuff in the ZIL not yet committed then, I would think.) I agree. ZFS apparently syncs uncommitted writes every 5 seconds. If there has been no filesystem I/O (including read I/O due to atime) for at least 10 seconds, and there has not been more data burst-written into RAM than can be written to disk in 10 seconds, then there should be nothing remaining to write. Regardless, it seems that the ZFS problems with crummy hardware are primarily due to the crummy hardware writting the data to the disk in a different order than expected. ZFS expects that after a sync that all pending writes are committed. The lesson is that unprofessional hardware may prove to be unreliable for professional usage. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote: > On Wed, 11 Feb 2009, Tim wrote: >> >> Right, except the OP stated he unmounted the filesystem in question, and >> it >> was the *ONLY* one on the drive, meaning there is absolutely 0 chance of >> their being pending writes. There's nothing to write to. > > This is an interesting assumption leading to a wrong conclusion. If > the file is updated and the filesystem is "unmounted", it is still > possible for there to be uncommitted data in the pool. If you pay > closer attention you will see that "mounting" the filesystem basically > just adds a logical path mapping since the filesystem is already > available under /poolname/filesystemname regardless. So doing the > mount makes /poolname/filesystemname available as /filesystemname, or > whatever mount path you specify. As a practical matter, it seems unreasonable to me that there would be uncommitted data in the pool after some quite short period of time when there's no new IO activity to the pool (not just the filesystem). 5 or 10 seconds, maybe? (Possibly excepting if there was a HUGE spike of IO for a while just before this; there could be considerable stuff in the ZIL not yet committed then, I would think.) That is, if I plug in a memory stick with ZFS on it, read and write for a while, then when I'm done and IO appears to have quiesced, observe that the IO light on the drive is inactive for several seconds, I'd be kinda disappointed if I got actual corrution if I pulled it. Complaints about not being exported next time I tried to import it, sure. Maybe other complaints. I wouldn't do this deliberately (other than for testing). But it seems wrong to leave things uncommitted significantlylonger than necessary (seconds are huge time units to a computer, after all), and if the device is sitting there not doing IO, there's no reason it shouldn't have been writing anything uncommitted instead. Conversely, anybody who is pulling disks / memory sticks off while IO is visibly incomplete really SHOULD expect to lose everything on them, even if sometimes they'll be luckier than that. I suppose we're dealing with people who didn't work with floppies here, where that lesson got pretty solidly beaten in to people :-. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 15:51, Frank Cusack wrote: > On February 11, 2009 3:02:48 PM -0600 Tim wrote: >> It's hardly uncommon for an entire datacenter to go down, redundant >> power >> or not. When it does, if it means I have to restore hundreds of >> terabytes if not petabytes from tape instead of just restoring the files >> that were corrupted or running an fsck, we've got issues. > > Isn't this easily worked around by having UPS power in addition to > whatever the data center supplies? Well, that covers some of the cases (it does take a fairly hefty UPS to deal with 100TB levels of redundant disk). > I've been there with entire data center shutdown (or partial, but entire > as far as my gear is concerned), but for really critical stuff we've had > our own UPS. I knew people once who had pretty careful power support; UPS where needed, then backup generator that would cut in automatically, and cut back when power was restored. Unfortunately, the cut back failed to happen automatically. On a weekend. So things sailed along fine until the generator ran out of fuel, and then shut down MOST uncleanly. Best laid plans of mice and men gang aft agley, or some such (from memory, and the spelling seems unlikely). Sure, human error was a factor. But human error is a MAJOR factor in the real world, and one of the things we're trying to protect our data from. Certainly, if a short power glitch on the normal mains feed (to lapse into Brit for a second) brings down your data server in an uncontrolled fashion, you didn't do a very good job of protecting it. My home NAS is protected to the point of one UPS, anyway. But real-world problems a few steps more severe can produce the same power cut, practically anywhere, just not as often. > I don't know if that really works for 100TB and up though. That's a lot > of disk == a lot of UPS capacity. And again, I'm not trying to take away > from the fact that this is a significant zfs problem. We've got this UPS in our server room that's about, oh, 4 washing machines in size. It's wired into building power, and powers the outlets the servers are plugged into, and the floor outlets out here the development PCs are plugged into also. I never got the tour, but I heard about the battery backup system at the old data center Northwest Airlines had back when they ran their own reservations system. Enough lead-acid batteries to keep an IBM mainframe running for three hours. One can certainly do it if one wants to badly enough, which one should if the data is important. I can't imagine anybody investing in 100TB of enterprise-grade storage if the data WASN'T important! -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, 11 Feb 2009, Tim wrote: Right, except the OP stated he unmounted the filesystem in question, and it was the *ONLY* one on the drive, meaning there is absolutely 0 chance of their being pending writes. There's nothing to write to. This is an interesting assumption leading to a wrong conclusion. If the file is updated and the filesystem is "unmounted", it is still possible for there to be uncommitted data in the pool. If you pay closer attention you will see that "mounting" the filesystem basically just adds a logical path mapping since the filesystem is already available under /poolname/filesystemname regardless. So doing the mount makes /poolname/filesystemname available as /filesystemname, or whatever mount path you specify. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 11, 2009 3:02:48 PM -0600 Tim wrote: On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack wrote: if you have 100TB of data, wouldn't you have a completely redundant storage network -- dual FC switches on different electrical supplies, etc. i've never designed or implemented a storage network before but such designs seem common in the literature and well supported by Solaris. i have done such designs with data networks and such redundancy is quite common. i mean, that's a lot of data to go missing due to a single device failing -- which it will. not to say it's not a problem with zfs, just that in the real world, it should be mitigated since your storage network design would overcome a single failure *anyway* -- regardless of zfs. It's hardly uncommon for an entire datacenter to go down, redundant power or not. When it does, if it means I have to restore hundreds of terabytes if not petabytes from tape instead of just restoring the files that were corrupted or running an fsck, we've got issues. Isn't this easily worked around by having UPS power in addition to whatever the data center supplies? I've been there with entire data center shutdown (or partial, but entire as far as my gear is concerned), but for really critical stuff we've had our own UPS. I don't know if that really works for 100TB and up though. That's a lot of disk == a lot of UPS capacity. And again, I'm not trying to take away from the fact that this is a significant zfs problem. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack wrote: > > if you have 100TB of data, wouldn't you have a completely redundant > storage network -- dual FC switches on different electrical supplies, > etc. i've never designed or implemented a storage network before but > such designs seem common in the literature and well supported by > Solaris. i have done such designs with data networks and such > redundancy is quite common. > > i mean, that's a lot of data to go missing due to a single device > failing -- which it will. > > not to say it's not a problem with zfs, just that in the real world, > it should be mitigated since your storage network design would overcome > a single failure *anyway* -- regardless of zfs. > It's hardly uncommon for an entire datacenter to go down, redundant power or not. When it does, if it means I have to restore hundreds of terabytes if not petabytes from tape instead of just restoring the files that were corrupted or running an fsck, we've got issues. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, Feb 11, 2009 at 11:46 AM, Kyle McDonald wrote: > > Yep. I've never unplugged a USB drive on purpose, but I have left a drive > plugged into the docking station, Hibernated windows XP professional, > undocked the laptop, and then woken it up later undocked. It routinely would > pop up windows saying that a 'delayed write' was not successful on the now > missing drive. > > I've always counted myself lucky that any new data written to that drive > was written long long before I hibernated, becuase have yet to find any > problems with that data, (but I don't read it very often if at all.) But it > is luck only! > > -Kyle > Right, except the OP stated he unmounted the filesystem in question, and it was the *ONLY* one on the drive, meaning there is absolutely 0 chance of their being pending writes. There's nothing to write to. I don't know what exactly it is you put on your USB drives, but I'm certainly aware of whether or not things on mine are in use before pulling the drive out. If a picture is open and in an editor, I'm obviously not going to save it then pull the drive mid-save. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 13:45, Ian Collins wrote: > David Dyer-Bennet wrote: >> I've spent $2000 on hardware and, by now, hundreds of hours of my time >> trying to get and keep a ZFS-based home NAS working. > > Hundreds of hours doing what? I just plugged in the drives, built the > pool and left the box in a corner for the past couple of years. It's > been upgraded twice, from build 62 to 72 to get the SATA framework and > then to b101 for CIFS. Well, good for you. It took me a lot of work to get it working in the first place (and then with only 4 of my 8 hot-swap bays, 4 of my 6 eSATA connections on the motherboard) working. Before that, I'd spent quite a lot of time trying to get VMWare to run Solaris, which it wouldn't back then. I did manage to get Parallels, I think it was, to let me create a Solaris system and then a ZFS pool to play with (this was back before OpenSolaris and before any sort of LiveCD I could find). Then I had a series of events starting in December of last year that, in hindsight, I think were mainly or entirely one memory SIMM going bad, which caused me to upgrade to 2008.11 and also have to restore my main pool from backup. Oh, and converted from using Samba to using CIFS. I'm just now getting close to having things up working again usably and stably, still working on backup. I do still have some problems with file access permissions I know, due to the new different handling of ACLs I guess. And I wasn't a Solaris admin to begin with. I guess SunOS back when was the first Unix I had root on, but since then I've mostly worked with Linux (including my time as news admin for a local ISP, and my years as an engineer with Sun, where I was in the streaming video server group). In some ways a completely UNfamiliar system might have been easier :-). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> after working for 1 month with ZFS on 2 external USB > drives I have experienced, that the all new zfs > filesystem is the most unreliable FS I have ever > seen. Troll. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
David Dyer-Bennet wrote: I've spent $2000 on hardware and, by now, hundreds of hours of my time trying to get and keep a ZFS-based home NAS working. Hundreds of hours doing what? I just plugged in the drives, built the pool and left the box in a corner for the past couple of years. It's been upgraded twice, from build 62 to 72 to get the SATA framework and then to b101 for CIFS. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 11, 2009 2:07:47 AM -0800 Gino wrote: I agree but I'd like to point out that the MAIN problem with ZFS is that because of a corruption you-ll loose ALL your data and there is no way to recover it. Consider an example where you have 100TB of data and a fc switch fails or other hw problem happens during I/O on a single file. With UFS you'll probably get corruption on that single file. With ZFS you'll loose all your data. I totally agree that ZFS is theoretically much much much much much better than UFS but in real world application having a risk to loose access to an entire pool is not acceptable. if you have 100TB of data, wouldn't you have a completely redundant storage network -- dual FC switches on different electrical supplies, etc. i've never designed or implemented a storage network before but such designs seem common in the literature and well supported by Solaris. i have done such designs with data networks and such redundancy is quite common. i mean, that's a lot of data to go missing due to a single device failing -- which it will. not to say it's not a problem with zfs, just that in the real world, it should be mitigated since your storage network design would overcome a single failure *anyway* -- regardless of zfs. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, Feb 11, 2009 at 11:19 AM, Tim wrote: > On Tue, Feb 10, 2009 at 11:44 PM, Fredrich Maney > wrote: >> Ah... an illiterate AND idiotic bigot. Have you even read the manual >> or *ANY* of the replies to your posts? *YOU* caused the situation that >> resulted in your data being corrupted. Not Sun, not OpenSolaris, not >> ZFS and not anyone on this list. Yet you feel the need to blame ZFS >> and insult the people that have been trying to help you understand >> what happened and why you shouldn't do what you did. > #1 English is clearly not his native tongue. Calling someone idiotic and > illiterate when they're doing as well as he is in a second language is not > only inaccurate, it's "idiotic". I have a great deal of respect for his command of more than one language. What I don't have any respect for is his complete unwillingness to actually read the dozens of responses that have all said the same thing, namely that his problems are self inflicted due his refusal to read the documentation. I refrained from calling him an idiot until after he proved himself one by spewing his blind bigotry against the US. All in all, I'd say he got far better treatment than he gave and infinitely better than he deserved. >> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like >> SVM or VxVM. It is both a filesystem and a logical volume manager. As >> such, like all LVM solutions, there are two steps that you must >> perform to safely remove a disk: unmount the filesystem and quiesce >> the volume. That means you *MUST*, in the case of ZFS, issue 'umount >> filesystem' *AND* 'zpool export' before you yank the USB stick out of >> the machine. >> >> Effectively what you did was create a one-sided mirrored volume with >> one filesystem on it, then put your very important (but not important >> enough to bother mirroring or backing up) data on it. Then you >> unmounted the filesystem and ripped the active volume out of the >> machine. You got away with it a couple of times because just how good >> of a job the ZFS developers did at idiot proofing it, but when it >> finally got to the point where you lost your data, you came here to >> bitch and point fingers at everyone but the responsible party (hint, >> it's you). When your ignorance (and fault) was pointed out to you, you >> then resorted to personal attacks and slurs. Nice. Very professional. >> Welcome to the bit-bucket. > > All that and yet the fact remains: I've never "ejected" a USB drive from OS > X or Windows, I simply pull it and go, and I've never once lost data, or had > it become unrecoverable or even corrupted. You've been lucky then. I've lost data and had corrupted filesystems on USB sticks on both of those OSes, as well as several Linux and BSD variants, from doing just that. [...] fpsm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 11, 2009 12:21:03 PM -0600 David Dyer-Bennet wrote: I've spent $2000 on hardware and, by now, hundreds of hours of my time trying to get and keep a ZFS-based home NAS working. Because it's the only affordable modern practice, my backups are on external drives (USB drives because that's "the" standard for consumer external drives, they were much cheaper when I bought them than any that supported Firewire at the 1TB size). So hearing how easy it is to muck up a ZFS pool on USB is leading me, again, to doubt this entire enterprise. Same here, except I have no doubts. As I only use the USB for backup, I'm quite happy with it. I have a 4-disk enclosure that accepts SATA drives. My main storage is a 12-bay SAS/SATA enclosure. After my own experience with USB (I still have the problem that I cannot create new pools while another USB drive is present with a zpool on it, whether or not that zpool is active ... no response on that thread yet and I expect never), I'm not thrilled with it and suspect some of the problem lies in the way that USB is handled differently than other physical connections (can't use 'format', e.g.). Anyway to get back to the point I wouldn't want to use it for primary storage, even if it were only 2 drives. That's unfortunate, but in line with Solaris' hardware support, historically. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 12:23, Bob Friesenhahn wrote: > On Wed, 11 Feb 2009, David Dyer-Bennet wrote: >> >> Then again, I've never lost data during the learning period, nor on the >> rare occasions where I just get it wrong. This is good; not quite >> remembering to eject a USB memory stick is *so* easy. > > With Windows and OS-X, it is up to the *user* to determine if they > have lost data. This is because they are designed to be user-friendly > operating systems. If the disk can be loaded at all, Windows and OS-X > will just go with what is left. If Windows and OS-X started to tell > users that they lost some data, then those users would be in a panic > (just like we see here). I don't carry much on my memory stick -- mostly stuff in transit from one place to another. Two things that live there constantly are my encrypted password database, and some private keys (encrypted under passphrases). So the stuff on the memory stick tends to get looked at, and the stuff that lives there is in a format where corruption is very likely to get noticed. So while I can't absolutely swear that I never lost data I didn't notice losing, I'm fairly confident that no data was lost. And I'm absolutely sure no data THAT I CARED ABOUT was lost, which is all that really matters. > The whole notion of "journaling" is to intentionally lose data by > rolling back to a known good point. More data might be lost than if > the task was left to a tool like 'fsck' but the journaling approach is > much faster. Windows and OS-X are highly unlikely to inform you that > some data was lost due to the filesystem being rolled back. True about journaling. This applies to NTFS disks for Windows, but not to FAT systems (which aren't journaled); and memory sticks for me are always FAT systems. Databases have something of an all-or-nothing problem as well, for that matter, and for something of the same reasons. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, 11 Feb 2009, David Dyer-Bennet wrote: Then again, I've never lost data during the learning period, nor on the rare occasions where I just get it wrong. This is good; not quite remembering to eject a USB memory stick is *so* easy. With Windows and OS-X, it is up to the *user* to determine if they have lost data. This is because they are designed to be user-friendly operating systems. If the disk can be loaded at all, Windows and OS-X will just go with what is left. If Windows and OS-X started to tell users that they lost some data, then those users would be in a panic (just like we see here). The whole notion of "journaling" is to intentionally lose data by rolling back to a known good point. More data might be lost than if the task was left to a tool like 'fsck' but the journaling approach is much faster. Windows and OS-X are highly unlikely to inform you that some data was lost due to the filesystem being rolled back. Your comments about write caching being a factor seem reasonable. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 10:49, Bob Friesenhahn wrote: > On Wed, 11 Feb 2009, David Dyer-Bennet wrote: >> This all-or-nothing behavior of ZFS pools is kinda scary. Turns out I'd >> rather have 99% of my data than 0% -- who knew? :-) I'd much rather >> have >> 100.00% than either of course, and I'm running ZFS with mirroring, and >> doing regular backups, because of that. > > It seems to me that this level of terror is getting out of hand. I am > glad to see that you made it to work today since statistics show that > you might have gotten into a deadly automobile accident on the way to > the office and would no longer care about your data. In fact, quite a > lot of people get in serious automobile accidents yet we rarely hear > such levels of terror regarding taking a drive in an automobile. > > Most people are far more afraid of taking a plane flight than taking a > drive in their car, even though taking a drive in their car is far > more risky. > > It is best to put risks in perspective. People are notoriously poor > at evaluating risks and paranoia is often the result. All true (and I'm certainly glad I made it to work myself; I did drive, which is one of the most dangerous things most people do). I think you're overstating my terror level, though; I'd say I'm at yellow; not even orange. I've spent $2000 on hardware and, by now, hundreds of hours of my time trying to get and keep a ZFS-based home NAS working. Because it's the only affordable modern practice, my backups are on external drives (USB drives because that's "the" standard for consumer external drives, they were much cheaper when I bought them than any that supported Firewire at the 1TB size). So hearing how easy it is to muck up a ZFS pool on USB is leading me, again, to doubt this entire enterprise. Am I really better off than I would be with an Infrant Ready NAS, or a Drobo? I'm certainly far behind financially and with my time. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 11:35, Toby Thain wrote: > > On 11-Feb-09, at 11:19 AM, Tim wrote: > >> ... >> And yes, I do keep checksums of all the data sitting on them and >> periodically check it. So, for all of your ranting and raving, the >> fact remains even a *crappy* filesystem like fat32 manages to >> handle a hot unplug without any prior notice without going belly up. > > By chance, certainly not design. No, I do think it's by design -- it's because the design isn't aggressively exploiting possible performance. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 11:21, Bob Friesenhahn wrote: > On Wed, 11 Feb 2009, Tim wrote: >> >> All that and yet the fact remains: I've never "ejected" a USB drive from >> OS >> X or Windows, I simply pull it and go, and I've never once lost data, or >> had >> it become unrecoverable or even corrupted. >> >> And yes, I do keep checksums of all the data sitting on them and >> periodically check it. So, for all of your ranting and raving, the fact >> remains even a *crappy* filesystem like fat32 manages to handle a hot >> unplug >> without any prior notice without going belly up. > > This seems like another one of your trolls. Any one of us who have > used USB drives under OS-X or Windows knows that the OS complains > quite a lot if you just unplug the drive so we all learn how to do > things properly. Then again, I've never lost data during the learning period, nor on the rare occasions where I just get it wrong. This is good; not quite remembering to eject a USB memory stick is *so* easy. We do all know why violating protocols here works so much of the time, right? It's because Windows is using very simple, old-fashioned strategies to write to the USB devices. Write caching is nonexistent, or of very short duration, for example. So if IO has quiesced to the device, it's been several seconds since the last IO, it's nearly certain to just pull it. Nearly. ZFS is applying much more modern, much more aggressive, optimizing strategies. This is entirely good; ZFS is intended for a space where that's important a lot of the time. But one tradeoff is that those rules become more important. > You must have very special data if you compute independent checksums > for each one of your files, and it leaves me wondering why you think > that data is correct due to being checksummed. Checksumming incorrect > data does not make that data correct. Can't speak for him, but I have par2 checksums and redundant data for lots of my old photos on disk. I created them before writing archival optical disks of the data, to give me some additional hope of recovering the data in the long run. I don't, in fact, know that most of those photos are actually valid data; only the ones I've viewed after creating the par2 checksums (and I can't rule out weird errors that don't result in corrupting the whole rest of the image even then). Still, once I've got the checksum on file, I can at least determine that I've had a disk error in many cases (not quite identical to determining that the data is still valid; after all, the data and the checksum could have been corrupted in such a way that I get a false positive on the checksum). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/11/2009 12:35 PM, Toby Thain wrote: On 11-Feb-09, at 11:19 AM, Tim wrote: ... And yes, I do keep checksums of all the data sitting on them and periodically check it. So, for all of your ranting and raving, the fact remains even a *crappy* filesystem like fat32 manages to handle a hot unplug without any prior notice without going belly up. By chance, certainly not design. Yep. I've never unplugged a USB drive on purpose, but I have left a drive plugged into the docking station, Hibernated windows XP professional, undocked the laptop, and then woken it up later undocked. It routinely would pop up windows saying that a 'delayed write' was not successful on the now missing drive. I've always counted myself lucky that any new data written to that drive was written long long before I hibernated, becuase have yet to find any problems with that data, (but I don't read it very often if at all.) But it is luck only! -Kyle --Toby --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 11-Feb-09, at 11:19 AM, Tim wrote: ... And yes, I do keep checksums of all the data sitting on them and periodically check it. So, for all of your ranting and raving, the fact remains even a *crappy* filesystem like fat32 manages to handle a hot unplug without any prior notice without going belly up. By chance, certainly not design. --Toby --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, 11 Feb 2009, Tim wrote: All that and yet the fact remains: I've never "ejected" a USB drive from OS X or Windows, I simply pull it and go, and I've never once lost data, or had it become unrecoverable or even corrupted. And yes, I do keep checksums of all the data sitting on them and periodically check it. So, for all of your ranting and raving, the fact remains even a *crappy* filesystem like fat32 manages to handle a hot unplug without any prior notice without going belly up. This seems like another one of your trolls. Any one of us who have used USB drives under OS-X or Windows knows that the OS complains quite a lot if you just unplug the drive so we all learn how to do things properly. You must have very special data if you compute independent checksums for each one of your files, and it leaves me wondering why you think that data is correct due to being checksummed. Checksumming incorrect data does not make that data correct. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
(...) Ah... an illiterate AND idiotic bigot. (...) I apologize for my poor English. Yes, it's not my mother tongue, but I have no doubt at all, that this discussion could be continued in German as well. But just to make it clear: Finally I did understand very well were I went wrong. But it wasn't something I did expect. Due to the fact, that I was using a single zpool with no other filesystems inside I thought, unmounting it with the command 'zfs umount usbhdd1' and checking if usbhdd1 is still shown in the output of 'mount' (it wasn't), I expected, that the pool was clearly unmounted and there is no risk to yank the USB wire. Even from the view of logic, that 'zpool export usbhdd1' will release the entire pool from the system should 'zfs umount usbhdd1' do the same in case no other filesystem exists inside this particular pool. if the output of the mount cmd doesn't show your zfs pool anymore what else should be there what can be unmounted? This is just what caused confusion on my side, and that's human, but I learned for the future. Regards. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, 11 Feb 2009, David Dyer-Bennet wrote: This all-or-nothing behavior of ZFS pools is kinda scary. Turns out I'd rather have 99% of my data than 0% -- who knew? :-) I'd much rather have 100.00% than either of course, and I'm running ZFS with mirroring, and doing regular backups, because of that. It seems to me that this level of terror is getting out of hand. I am glad to see that you made it to work today since statistics show that you might have gotten into a deadly automobile accident on the way to the office and would no longer care about your data. In fact, quite a lot of people get in serious automobile accidents yet we rarely hear such levels of terror regarding taking a drive in an automobile. Most people are far more afraid of taking a plane flight than taking a drive in their car, even though taking a drive in their car is far more risky. It is best to put risks in perspective. People are notoriously poor at evaluating risks and paranoia is often the result. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss