Thanks, this is helpful. I was definitely misunderstanding the part that
the ZIL plays in ZFS.

I found Richard Elling's discussion of the FMA response to the failure
very informative.   I see how the device driver, the fault analysis
layer and the ZFS layer are all working together.    Though the
customer's complaint that the change in state from "working" to "not
working" is taking too long seems pretty valid.

Peter

Neil Perrin wrote:
>
>
> Peter Cudhea wrote:
>> Your point is well taken that ZFS should not duplicate functionality 
>> that is already or should be available at the device driver level.    
>> In this case, I think it misses the point of what ZFS should be doing 
>> that it is not.
>>
>> ZFS does its own periodic commits to the disk, and it knows if those 
>> commit points have reached the disk or not, or whether they are 
>> getting errors.    In this particular case, those commits to disk are 
>> presumably failing, because one of the disks they depend on has been 
>> removed from the system.   (If the writes are not being marked as 
>> failures, that would definitely be an error in the device driver, as 
>> you say.)  In this case, however, the ZIL log has stopped being 
>> updated, but ZFS does nothing to announce that this has happened, or 
>> to indicate that a remedy is required.
>
> I think you have some misconceptions about how the ZIL works.
> It doesn't provide journalling like UFS. The following might help:
>
> http://blogs.sun.com/perrin/entry/the_lumberjack
>
> The ZIL isn't used at all unless there's fsync/O_DSYNC activity.
>
>>
>> At the very least, it would be extremely helpful if  ZFS had a status 
>> to report that indicates that the ZIL log is out of date, or that 
>> there are troubles writing to the ZIL log, or something like that.
>
> If the ZIL cannot be written then we force a transaction group (txg)
> commit. That is the only recourse to force data to stable storage before
> returning to the application.
>>
>> An additional feature would be to have user-selectable behavior when 
>> the ZIL log is significantly out of date.    For example, if the ZIL 
>> log is more than X seconds out of date, then new writes to the system 
>> should pause, or give errors or continue to silently succeed.
>
> Again this doesn't make sense given how the ZIL works.
>
>>
>> In an earlier phase of my career when I worked for a database 
>> company, I was responsible for a similar bug.   It caused a major 
>> customer to lose a major amount of data when a system rebooted when 
>> not all good data had been successfully committed to disk.    The 
>> resulting stink caused us to add a feature to detect the cases when 
>> the writing-to-disk process had fallen too far behind, and to pause 
>> new writes to the database until the situation was resolved.
>>
>> Peter
>>
>> Bob Friesenhahn wrote:
>>> While I do believe that device drivers. or the fault system, should 
>>> notify ZFS when a device fails (and ZFS should appropriately react), 
>>> I don't think that ZFS should be responsible for fault monitoring.  
>>> ZFS is in a rather poor position for device fault monitoring, and if 
>>> it attempts to do so then it will be slow and may misbehave in other 
>>> ways.  The software which communicates with the device (i.e. the 
>>> device driver) is in the best position to monitor the device.
>>>
>>> The primary goal of ZFS is to be able to correctly read data which 
>>> was successfully committed to disk.  There are programming 
>>> interfaces (e.g. fsync(), msync()) which may be used to ensure that 
>>> data is committed to disk, and which should return an error if there 
>>> is a problem.  If you were performing your tests over an NFS mount 
>>> then the results should be considerably different since NFS requests 
>>> that its data be committed to disk.
>>>
>>> Bob
>>> ======================================
>>> Bob Friesenhahn
>>> [EMAIL PROTECTED], 
>>> http://www.simplesystems.org/users/bfriesen/
>>> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>>>
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>   
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to