>>>> How can I diagnose why a resilver appears to be hanging at a certain
>>>> percentage, seemingly doing nothing for quite a while, even though the
>>>> HDD LED is lit up permanently (no apparent head seeking)?
>>>> 
>>>> The drives in the pool are WD Raid Editions, thus have TLER and should
>>>> time out on errors in just seconds. ZFS nor the syslog however were
>>>> reporting any IO errors, so it weren't the disks.
>>>> 
>>> 
>>> Check the FMA logs:
>>>   fmadm faulty
>>>   fmdump -e[vV]
>> 
>> Nothing noteworthy in there. fmadm shows nothing, fmdump just 
>> ereport.io.ddi.fm-capability repeatedly, which comes from oss_cmi8788 (some 
>> OpenSound driver).
>
> argv!  Can you try to dd from the misbehaving device and see if
> that kicks off a diagnosis?  It may take some time to timeout, though,
> by default it will be several minutes per iop.  (for the geezers, format
> has had a media scanner for decades)

Well, I'm not even sure if a device actually misbehaves. During regular 
operation, there don't appear to be any issues.

Both disks of the mirror in the pool appeared to work correctly tho during 
the stuck scrub, because according to zpool iostat reads and writes went 
through.

I will try to reproduce it this weekend (it's a desktop machine, can't 
hard reset via ssh :), hoping that the ZFS version upgrade fixed this.

FYI, the disks have TLER of 7 seconds. Should it really take several 
minutes per IOP?

Regards,
-mg

>>> 
>>>> Stopping the scrub didn't work, the zfs command didn't return. It took a
>>>> hard reset to make it stop.
>>>> 
>>> 
>>> scrub is not a zfs subcommand, perhaps you meant zpool?
>>> Depending on the failure, zpool commands may hang, fixed in b100.
>> 
>> Yeah, sorry, zpool doesn't return.
>> 
>> Regards,
>> -mg
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to