Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

Victor Latushkin Mon, 28 Sep 2009 11:29:49 -0700

On 28.09.09 22:01, Richard Elling wrote:

On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote:
Richard Elling wrote:
On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:
On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
On Mon, 28 Sep 2009, Richard Elling wrote:
Scrub could be faster, but you can try
   tar cf - . > /dev/null
If you think about it, validating checksums requires reading thedata.
So you simply need to read the data.
This should work but it does not verify the redundant metadata.  For
example, the duplicate metadata copy might be corrupt but the problem
is not detected since it did not happen to be used.
Too bad we cannot scrub a dataset/object.
Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you to a
specific file, though.
With ever increasing disk and pool sizes it takes more and more timefor scrub to complete its job. Let's imagine that you have 100TB poolwith 90TB of data in it, and there's dataset with 10TB that iscritical and another dataset with 80TB that is not that critical andyou can afford loosing some blocks/files there.
Personally, I have three concerns here.
1. Gratuitous complexity, especially inside a pool -- aka creepingfeaturism

There's the idea of priority-based resilvering (though not implemented yet, seehttp://blogs.sun.com/bonwick/en_US/entry/smokin_mirrors) that can be simplyextended to scrubs as well.

2. Wouldn't a better practice be to use two pools with differentprotectionpolicies? The only protection policy differences inside a poolare copies.In other words, I am concerned that people replace good dataprotectionpractices with scrubs and expecting scrub to deliver better dataprotection
       (it won't).

It may be better, it may be not... With two pools you split you bandwidth andIOPS and space and have more entities to care about...

3. Since the pool contains the set of blocks, shared by datasets, itis not clearto me that scrubbing a dataset will detect all of the datacorruption failureswhich can affect the dataset. I'm thinking along the lines ofphantom writes,
       for example.

That is why it may be useful to always scrub pool-wide metadata or have a way tospecifically request it.

    4. the time it takes to scrub lots of stuff
...there are four concerns... :-)
For magnetic media, a yearly scrub interval should suffice for mostfolks. I know
some folks who scrub monthly. More frequent scrubs won't buy much.

It won't buy you much in term of magnetic media decay discovery. Unfortunately,there other sources of corruption as well (including phantom writes you arethinking about), and being able to discover corruption and recover it as quicklyas possible from the backup it a good thing.

Scrubs are also useful for detecting broken hardware. However, normal
activity will also detect broken hardware, so it is better to think ofscrubs asfinding degradation of old data rather than being a hardware checkingservice.
So being able to scrub individual dataset would help to run scrubs ofcritical data more frequently and faster and schedule scrubs for lessfrequently used and/or less important data to happen much lessfrequently.
It may be useful to have a way to tell ZFS to scrub pool-wide metadataonly (space maps etc), so that you can build your own schedule of scrubs.
Another interesting idea is to be able to scrub only blocks modifiedsince last snapshot.
This can be relatively easy to implement. But remember that scrubs are most
useful for finding data which has degraded from the media. In otherwords, olddata. New data is not likely to have degraded yet, and since ZFS is COW,all of
the new data is, well, new.

This is why having the ability to bound the start and end of a scrub by txg
can be easy and perhaps useful.

This requires exporting concept of the transaction group numbers to the user andi do not see how it is less complex from the user interface perspective thanbeing able to request scrub of individual dataset, pool-wide metadata ornewly-written data.


regards,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

Reply via email to