I think the important point here is that this makes the case for ZFS handling at least one layer of redundancy. If the disk you pulled was part of a mirror or raidz, there wouldn't be data loss when your system was rebooted. In fact, the zpool status commands would likely keep working, and a reboot wouldn't be necessary at all. I think it's unreasonable to expect a system with any file system to recover from a single drive being pulled. Of course, loosing extra work because of the delayed notification is bad, but none the less, this is not a reasonable test. Basically, always provide redundancy in your zpool config.
Jon Ross Smith wrote: > A little more information today. I had a feeling that ZFS would > continue quite some time before giving an error, and today I've shown > that you can carry on working with the filesystem for at least half an > hour with the disk removed. > > I suspect on a system with little load you could carry on working for > several hours without any indication that there is a problem. It > looks to me like ZFS is caching reads & writes, and that provided > requests can be fulfilled from the cache, it doesn't care whether the > disk is present or not. > > I would guess that ZFS is attempting to write to the disk in the > background, and that this is silently failing. > > Here's the log of the tests I did today. After removing the drive, > over a period of 30 minutes I copied folders to the filesystem, > created an archive, set permissions, and checked properties. I did > this both in the command line and with the graphical file manager tool > in Solaris. Neither reported any errors, and all the data could be > read & written fine. Until the reboot, at which point all the data > was lost, again without error. > > If you're not interested in the detail, please skip to the end where > I've got some thoughts on just how many problems there are here. > > > # zpool status test > pool: test > state: ONLINE > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > errors: No known data errors > # zfs list test > NAME USED AVAIL REFER MOUNTPOINT > test 243M 228G 242M /test > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 243M 232G 0% ONLINE - > > > -- drive removed -- > > > # cfgadm |grep sata1/7 > sata1/7 sata-port empty unconfigured ok > > > -- cfgadmin knows the drive is removed. How come ZFS does not? -- > > > # cp -r /rc-pool/copytest /test/copytest > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 73.4M 232G 0% ONLINE - > # zfs list test > NAME USED AVAIL REFER MOUNTPOINT > test 142K 228G 18K /test > > > -- Yup, still up. Let's start the clock -- > > > # date > Tue Jul 29 09:31:33 BST 2008 > # du -hs /test/copytest > 667K /test/copytest > > > -- 5 minutes later, still going strong -- > > > # date > Tue Jul 29 09:36:30 BST 2008 > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 73.4M 232G 0% ONLINE - > # cp -r /rc-pool/copytest /test/copytest2 > # ls /test > copytest copytest2 > # du -h -s /test > 1.3M /test > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 73.4M 232G 0% ONLINE - > # find /test | wc -l > 2669 > # find //test/copytest | wc -l > 1334 > # find /rc-pool/copytest | wc -l > 1334 > # du -h -s /rc-pool/copytest > 5.3M /rc-pool/copytest > > > -- Not sure why the original pool has 5.3MB of data when I use du. -- > -- File Manager reports that they both have the same size -- > > > -- 15 minutes later it's still working. I can read data fine -- > > # date > Tue Jul 29 09:43:04 BST 2008 > # chmod 777 /test/* > # mkdir /rc-pool/test2 > # cp -r /test/copytest2 /rc-pool/test2/copytest2 > # find /rc-pool/test2/copytest2 | wc -l > 1334 > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 73.4M 232G 0% ONLINE - > > > -- and yup, the drive is still offline -- > > > # cfgadm | grep sata1/7 > sata1/7 sata-port empty unconfigured ok > > > -- And finally, after 30 minutes the pool is still going strong -- > > > # date > Tue Jul 29 09:59:56 BST 2008 > # tar -cf /test/copytest.tar /test/copytest/* > # ls -l > total 3 > drwxrwxrwx 3 root root 3 Jul 29 09:30 copytest > -rwxrwxrwx 1 root root 4626432 Jul 29 09:59 copytest.tar > drwxrwxrwx 3 root root 3 Jul 29 09:39 copytest2 > # zpool list test > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > test 232G 73.4M 232G 0% ONLINE - > > > After a full 30 minutes there's no indication whatsoever of any > problem. Checking properties of the folder in File Browser reports > 2665 items, totalling 9.0MB. > > At this point I tried "# zfs set sharesmb=on test". I didn't really > expect it to work, and sure enough, that command hung. zpool status > also hung, so I had to reboot the server. > > > -- Rebooted server -- > > > Now I found that not only are all the files I've written in the last > 30 minutes missing, but in fact files that I had deleted several > minutes prior to removing the drive have re-appeared. > > > -- /test mount point is still present, I'll probably have to remove > that manually -- > > > # cd / > # ls > bin export media proc system > boot home mnt rc-pool test > dev kernel net rc-usb tmp > devices lib opt root usr > etc lost+found platform sbin var > > > -- ZFS still has the pool mounted, but at least now it realises it's > not working -- > > > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > rc-pool 2.27T 52.6G 2.21T 2% DEGRADED - > test - - - - FAULTED - > # zpool status test > pool: test > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using 'zpool online'. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > test UNAVAIL 0 0 0 insufficient replicas > c2t7d0 UNAVAIL 0 0 0 cannot open > > > -- At least re-activating the pool is simple, but gotta love the "No > known data errors" line -- > > > # cfgadm -c configure sata1/7 > # zpool status test > pool: test > state: ONLINE > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > errors: No known data errors > > > -- But of course, although ZFS thinks it's online, it didn't mount > properly -- > > > # cd /test > # ls > # zpool export test > # rm -r /test > # zpool import test > # cd test > # ls > var (copy) var2 > > > -- Now that's unexpected. Those folders should be long gone. Let's > see how many files ZFS failed to delete -- > > > # du -h -s /test > 77M /test > # find /test | wc -l > 19033 > > > So in addition to working for a full half hour creating files, it's > also failed to remove 77MB of data contained in nearly 20,000 files. > And it's done all that without reporting any error or problem with the > pool. > > In fact, if I didn't know what I was looking for, there would be no > indication of a problem at all. Before the reboot I can't find what's > going on as "zfs status" hangs. After the reboot it says there's no > problem. Both ZFS and it's troubleshooting tools fail in a big way > here. > > As others have said, "zfs status" should not hang. ZFS has to know > the state of all the drives and pools it's currently using, "zfs > status" should simply report the current known status from ZFS' > internal state. It shouldn't need to scan anything. ZFS' internal > state should also be checking with cfgadm so that it knows if a disk > isn't there. It should also be updated if the cache can't be flushed > to disk, and "zfs list / zpool list" needs to borrow state information > from the status commands so that they don't say 'online' when the pool > has problems. > > ZFS needs to deal more intelligently with mount points when a pool has > problems. Leaving the folder lying around in a way that prevents the > pool mounting properly when the drives are recovered is not good. > When the pool appears to come back online without errors, it would be > very easy for somebody to assume the data was lost from the pool > without realising that it simply hasn't mounted and they're actually > looking at an empty folder. Firstly ZFS should be removing the mount > point when problems occur, and secondly, ZFS list or ZFS status should > include information to inform you that the pool could not be mounted > properly. > > ZFS status really should be warning of any ZFS errors that occur. > Including things like being unable to mount the pool, CIFS mounts > failing, etc... > > And finally, if ZFS does find problems writing from the cache, it > really needs to log somewhere the names of all the files affected, and > the action that could not be carried out. ZFS knows the files it was > meant to delete here, it also knows the files that were written. I > can accept that with delayed writes files may occasionally be lost > when a failure happens, but I don't accept that we need to loose all > knowledge of the affected files when the filesystem has complete > knowledge of what is affected. If there are any working filesystems > on the server, ZFS should make an attempt to store a log of the > problem, failing that it should e-mail the data out. The admin really > needs to know what files have been affected so that they can notify > users of the data loss. I don't know where you would store this > information, but wherever that is, "zpool status" should be reporting > the error and directing the admin to the log file. > > I would probably say this could be safely stored on the system drive. > Would it be possible to have a number of possible places to store this > log? What I'm thinking is that if the system drive is unavailable, > ZFS could try each pool in turn and attempt to store the log there. > > In fact e-mail alerts or external error logging would be a great > addition to ZFS. Surely it makes sense that filesystem errors would > be better off being stored and handled externally? > > Ross > > > > > Date: Mon, 28 Jul 2008 12:28:34 -0700 > > From: [EMAIL PROTECTED] > > Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive > removed > > To: [EMAIL PROTECTED] > > > > I'm trying to reproduce and will let you know what I find. > > -- richard > > > > > ------------------------------------------------------------------------ > Win £3000 to spend on whatever you want at Uni! Click here to WIN! > <http://clk.atdmt.com/UKM/go/101719803/direct/01/> > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 [EMAIL PROTECTED] - ______/ ______/ ______/ AST:7731^29u18e3 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss