Just a side comment: this discussion shows all the classic symptoms of 
two groups of people with different basic assumptions, each wondering why 
the other said what they did.
  Getting these out in the open would be A Good Thing (;-))

--dave

Jonathan Loran wrote:
> I think the important point here is that this makes the case for ZFS 
> handling at least one layer of redundancy.  If the disk you pulled was 
> part of a mirror or raidz, there wouldn't be data loss when your system 
> was rebooted.  In fact, the zpool status commands would likely keep 
> working, and a reboot wouldn't be necessary at all.  I think it's 
> unreasonable to expect a system with any file system to recover from a 
> single drive being pulled.  Of course, loosing extra work because of the 
> delayed notification is bad, but none the less, this is not a reasonable 
> test.  Basically, always provide redundancy in your zpool config.
> 
> Jon
> 
> Ross Smith wrote:
> 
>>A little more information today.  I had a feeling that ZFS would 
>>continue quite some time before giving an error, and today I've shown 
>>that you can carry on working with the filesystem for at least half an 
>>hour with the disk removed.
>> 
>>I suspect on a system with little load you could carry on working for 
>>several hours without any indication that there is a problem.  It 
>>looks to me like ZFS is caching reads & writes, and that provided 
>>requests can be fulfilled from the cache, it doesn't care whether the 
>>disk is present or not.
>> 
>>I would guess that ZFS is attempting to write to the disk in the 
>>background, and that this is silently failing.
>> 
>>Here's the log of the tests I did today.  After removing the drive, 
>>over a period of 30 minutes I copied folders to the filesystem, 
>>created an archive, set permissions, and checked properties.  I did 
>>this both in the command line and with the graphical file manager tool 
>>in Solaris.  Neither reported any errors, and all the data could be 
>>read & written fine.  Until the reboot, at which point all the data 
>>was lost, again without error.
>> 
>>If you're not interested in the detail, please skip to the end where 
>>I've got some thoughts on just how many problems there are here.
>> 
>> 
>># zpool status test
>>  pool: test
>> state: ONLINE
>> scrub: none requested
>>config:
>>        NAME        STATE     READ WRITE CKSUM
>>        test        ONLINE       0     0     0
>>          c2t7d0    ONLINE       0     0     0
>>errors: No known data errors
>># zfs list test
>>NAME   USED  AVAIL  REFER  MOUNTPOINT
>>test   243M   228G   242M  /test
>># zpool list test
>>NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test   232G   243M   232G     0%  ONLINE  -
>> 
>>
>>-- drive removed --
>> 
>>
>># cfgadm |grep sata1/7
>>sata1/7                        sata-port    empty        unconfigured ok
>> 
>> 
>>-- cfgadmin knows the drive is removed.  How come ZFS does not? --
>> 
>>
>># cp -r /rc-pool/copytest /test/copytest
>># zpool list test
>>NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test      232G  73.4M   232G     0%  ONLINE  -
>># zfs list test
>>NAME   USED  AVAIL  REFER  MOUNTPOINT
>>test   142K   228G    18K  /test
>> 
>> 
>>-- Yup, still up.  Let's start the clock --
>> 
>>
>># date
>>Tue Jul 29 09:31:33 BST 2008
>># du -hs /test/copytest
>> 667K /test/copytest
>> 
>> 
>>-- 5 minutes later, still going strong --
>> 
>>
>># date
>>Tue Jul 29 09:36:30 BST 2008
>># zpool list test
>>NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test      232G  73.4M   232G     0%  ONLINE  -
>># cp -r /rc-pool/copytest /test/copytest2
>># ls /test
>>copytest   copytest2
>># du -h -s /test
>> 1.3M /test
>># zpool list test
>>NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test   232G  73.4M   232G     0%  ONLINE  -
>># find /test | wc -l                        
>>    2669
>># find //test/copytest | wc -l
>>    1334
>># find /rc-pool/copytest | wc -l
>>    1334
>># du -h -s /rc-pool/copytest
>> 5.3M /rc-pool/copytest
>> 
>> 
>>-- Not sure why the original pool has 5.3MB of data when I use du. --
>>-- File Manager reports that they both have the same size --
>> 
>> 
>>-- 15 minutes later it's still working.  I can read data fine --
>>
>># date
>>Tue Jul 29 09:43:04 BST 2008
>># chmod 777 /test/*
>># mkdir /rc-pool/test2
>># cp -r /test/copytest2 /rc-pool/test2/copytest2
>># find /rc-pool/test2/copytest2 | wc -l
>>    1334
>># zpool list test
>>NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test      232G  73.4M   232G     0%  ONLINE  -
>> 
>> 
>>-- and yup, the drive is still offline --
>> 
>>
>># cfgadm | grep sata1/7
>>sata1/7                        sata-port    empty        unconfigured ok
>>
>>
>>-- And finally, after 30 minutes the pool is still going strong --
>> 
>>
>># date
>>Tue Jul 29 09:59:56 BST 2008
>># tar -cf /test/copytest.tar /test/copytest/*
>># ls -l
>>total 3
>>drwxrwxrwx   3 root     root           3 Jul 29 09:30 copytest
>>-rwxrwxrwx   1 root     root     4626432 Jul 29 09:59 copytest.tar
>>drwxrwxrwx   3 root     root           3 Jul 29 09:39 copytest2
>># zpool list test
>>NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>test   232G  73.4M   232G     0%  ONLINE  -
>>
>> 
>>After a full 30 minutes there's no indication whatsoever of any 
>>problem.  Checking properties of the folder in File Browser reports 
>>2665 items, totalling 9.0MB.
>> 
>>At this point I tried "# zfs set sharesmb=on test".  I didn't really 
>>expect it to work, and sure enough, that command hung.  zpool status 
>>also hung, so I had to reboot the server.
>> 
>> 
>>-- Rebooted server --
>> 
>> 
>>Now I found that not only are all the files I've written in the last 
>>30 minutes missing, but in fact files that I had deleted several 
>>minutes prior to removing the drive have re-appeared.
>> 
>> 
>>-- /test mount point is still present, I'll probably have to remove 
>>that manually --
>> 
>> 
>># cd /
>># ls
>>bin         export      media       proc        system
>>boot        home        mnt         rc-pool     test
>>dev         kernel      net         rc-usb      tmp
>>devices     lib         opt         root        usr
>>etc         lost+found  platform    sbin        var
>> 
>> 
>>-- ZFS still has the pool mounted, but at least now it realises it's 
>>not working --
>> 
>> 
>># zpool list
>>NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
>>rc-pool  2.27T  52.6G  2.21T     2%  DEGRADED  -
>>test         -      -      -      -  FAULTED  -
>># zpool status test
>>  pool: test
>> state: UNAVAIL
>>status: One or more devices could not be opened.  There are insufficient
>> replicas for the pool to continue functioning.
>>action: Attach the missing device and online it using 'zpool online'.
>>   see: http://www.sun.com/msg/ZFS-8000-3C
>> scrub: none requested
>>config:
>> NAME        STATE     READ WRITE CKSUM
>> test        UNAVAIL      0     0     0  insufficient replicas
>>   c2t7d0    UNAVAIL      0     0     0  cannot open
>> 
>> 
>>-- At least re-activating the pool is simple, but gotta love the "No 
>>known data errors" line --
>> 
>>
>># cfgadm -c configure sata1/7
>># zpool status test
>>  pool: test
>> state: ONLINE
>> scrub: none requested
>>config:
>> NAME        STATE     READ WRITE CKSUM
>> test        ONLINE       0     0     0
>>   c2t7d0    ONLINE       0     0     0
>>errors: No known data errors
>> 
>> 
>>-- But of course, although ZFS thinks it's online, it didn't mount 
>>properly --
>> 
>>
>># cd /test
>># ls
>># zpool export test
>># rm -r /test
>># zpool import test
>># cd test
>># ls
>>var (copy)  var2
>> 
>> 
>>-- Now that's unexpected.  Those folders should be long gone.  Let's 
>>see how many files ZFS failed to delete --
>> 
>>
>># du -h -s /test
>>  77M /test
>># find /test | wc -l
>>   19033
>> 
>> 
>>So in addition to working for a full half hour creating files, it's 
>>also failed to remove 77MB of data contained in nearly 20,000 files.  
>>And it's done all that without reporting any error or problem with the 
>>pool.
>> 
>>In fact, if I didn't know what I was looking for, there would be no 
>>indication of a problem at all.  Before the reboot I can't find what's 
>>going on as "zfs status" hangs.  After the reboot it says there's no 
>>problem.  Both ZFS and it's troubleshooting tools fail in a big way 
>>here. 
>> 
>>As others have said, "zfs status" should not hang.  ZFS has to know 
>>the state of all the drives and pools it's currently using, "zfs 
>>status" should simply report the current known status from ZFS' 
>>internal state.  It shouldn't need to scan anything.  ZFS' internal 
>>state should also be checking with cfgadm so that it knows if a disk 
>>isn't there.  It should also be updated if the cache can't be flushed 
>>to disk, and "zfs list / zpool list" needs to borrow state information 
>>from the status commands so that they don't say 'online' when the pool 
>>has problems.
>> 
>>ZFS needs to deal more intelligently with mount points when a pool has 
>>problems.  Leaving the folder lying around in a way that prevents the 
>>pool mounting properly when the drives are recovered is not good.  
>>When the pool appears to come back online without errors, it would be 
>>very easy for somebody to assume the data was lost from the pool 
>>without realising that it simply hasn't mounted and they're actually 
>>looking at an empty folder.  Firstly ZFS should be removing the mount 
>>point when problems occur, and secondly, ZFS list or ZFS status should 
>>include information to inform you that the pool could not be mounted 
>>properly.
>> 
>>ZFS status really should be warning of any ZFS errors that occur.  
>>Including things like being unable to mount the pool, CIFS mounts 
>>failing, etc...
>> 
>>And finally, if ZFS does find problems writing from the cache, it 
>>really needs to log somewhere the names of all the files affected, and 
>>the action that could not be carried out.  ZFS knows the files it was 
>>meant to delete here, it also knows the files that were written.  I 
>>can accept that with delayed writes files may occasionally be lost 
>>when a failure happens, but I don't accept that we need to loose all 
>>knowledge of the affected files when the filesystem has complete 
>>knowledge of what is affected.  If there are any working filesystems 
>>on the server, ZFS should make an attempt to store a log of the 
>>problem, failing that it should e-mail the data out.  The admin really 
>>needs to know what files have been affected so that they can notify 
>>users of the data loss.  I don't know where you would store this 
>>information, but wherever that is, "zpool status" should be reporting 
>>the error and directing the admin to the log file.
>> 
>>I would probably say this could be safely stored on the system drive.  
>>Would it be possible to have a number of possible places to store this 
>>log?  What I'm thinking is that if the system drive is unavailable, 
>>ZFS could try each pool in turn and attempt to store the log there.
>> 
>>In fact e-mail alerts or external error logging would be a great 
>>addition to ZFS.  Surely it makes sense that filesystem errors would 
>>be better off being stored and handled externally?
>> 
>>Ross
>> 
>>
>>
>>
>>>Date: Mon, 28 Jul 2008 12:28:34 -0700
>>>From: [EMAIL PROTECTED]
>>>Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive 
>>
>>removed
>>
>>>To: [EMAIL PROTECTED]
>>>
>>>I'm trying to reproduce and will let you know what I find.
>>>-- richard
>>>
>>
>>
>>------------------------------------------------------------------------
>>Win £3000 to spend on whatever you want at Uni! Click here to WIN! 
>><http://clk.atdmt.com/UKM/go/101719803/direct/01/>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>zfs-discuss mailing list
>>zfs-discuss@opensolaris.org
>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>  
> 
> 

-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
[EMAIL PROTECTED]                 |                      -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to