On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote:
Richard Elling wrote:
On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:
On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
On Mon, 28 Sep 2009, Richard Elling wrote:
Scrub could be faster, but you can try
tar cf - . > /dev/null
If you think about it, validating checksums requires reading the
data.
So you simply need to read the data.
This should work but it does not verify the redundant metadata.
For
example, the duplicate metadata copy might be corrupt but the
problem
is not detected since it did not happen to be used.
Too bad we cannot scrub a dataset/object.
Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you
to a
specific file, though.
With ever increasing disk and pool sizes it takes more and more time
for scrub to complete its job. Let's imagine that you have 100TB
pool with 90TB of data in it, and there's dataset with 10TB that is
critical and another dataset with 80TB that is not that critical and
you can afford loosing some blocks/files there.
Personally, I have three concerns here.
1. Gratuitous complexity, especially inside a pool -- aka creeping
featurism
2. Wouldn't a better practice be to use two pools with different
protection
policies? The only protection policy differences inside a pool are
copies.
In other words, I am concerned that people replace good data
protection
practices with scrubs and expecting scrub to deliver better data
protection
(it won't).
3. Since the pool contains the set of blocks, shared by datasets, it
is not clear
to me that scrubbing a dataset will detect all of the data
corruption failures
which can affect the dataset. I'm thinking along the lines of
phantom writes,
for example.
4. the time it takes to scrub lots of stuff
...there are four concerns... :-)
For magnetic media, a yearly scrub interval should suffice for most
folks. I know
some folks who scrub monthly. More frequent scrubs won't buy much.
Scrubs are also useful for detecting broken hardware. However, normal
activity will also detect broken hardware, so it is better to think of
scrubs as
finding degradation of old data rather than being a hardware checking
service.
So being able to scrub individual dataset would help to run scrubs
of critical data more frequently and faster and schedule scrubs for
less frequently used and/or less important data to happen much less
frequently.
It may be useful to have a way to tell ZFS to scrub pool-wide
metadata only (space maps etc), so that you can build your own
schedule of scrubs.
Another interesting idea is to be able to scrub only blocks modified
since last snapshot.
This can be relatively easy to implement. But remember that scrubs are
most
useful for finding data which has degraded from the media. In other
words, old
data. New data is not likely to have degraded yet, and since ZFS is
COW, all of
the new data is, well, new. This is why having the ability to bound
the start and
end of a scrub by txg can be easy and perhaps useful.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss