Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-11 Thread Valerie Henson
On Wed, May 09, 2007 at 02:51:41PM -0500, Matt Mackall wrote: We will, unfortunately, need to be able to check an entire directory at once. There's no other efficient way to assure that there are no duplicate names in a directory, for instance. I don't see that being a major problem for the

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-11 Thread Matt Mackall
On Fri, May 11, 2007 at 03:46:41AM -0600, Valerie Henson wrote: On Wed, May 09, 2007 at 02:51:41PM -0500, Matt Mackall wrote: We will, unfortunately, need to be able to check an entire directory at once. There's no other efficient way to assure that there are no duplicate names in a

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Valerie Henson
On Sun, Apr 29, 2007 at 02:21:13PM +0200, J??rn Engel wrote: On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Valerie Henson
On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: This does mean that our time to make progress on a check is bounded at the top by the size of our largest file. If we have a degenerate filesystem filled with a single file, this will in fact take as long as a conventional fsck.

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Jörn Engel
On Tue, 8 May 2007 22:56:09 -0700, Valerie Henson wrote: I like it too, especially the rmap stuff, but I don't think it solves some of the problems chunkfs solves. The really nice thing about chunkfs is that it tries hard to isolate each chunk from all the other chunks. You can think of

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Nikita Danilov
Valerie Henson writes: [...] Hm, I'm not sure that everyone understands, a particular subtlety of how the fsck algorithm works in chunkfs. A lot of people seem to think that you need to check *all* cross-chunk links, every time an individual chunk is checked. That's not the case;

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Matt Mackall
On Wed, May 09, 2007 at 12:56:39AM -0700, Valerie Henson wrote: On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: This does mean that our time to make progress on a check is bounded at the top by the size of our largest file. If we have a degenerate filesystem filled with a

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Valerie Henson
On Wed, May 09, 2007 at 03:16:41PM +0400, Nikita Danilov wrote: I guess I miss something. If chunkfs maintains at most one continuation per chunk invariant, then continuation inode might end up with multiple byte ranges, and to check that they do not overlap one has to read indirect blocks

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Valerie Henson
On Wed, May 09, 2007 at 12:06:52PM -0500, Matt Mackall wrote: On Wed, May 09, 2007 at 12:56:39AM -0700, Valerie Henson wrote: On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: This does mean that our time to make progress on a check is bounded at the top by the size of our

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Valerie Henson
On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: On Sun, Apr 29, 2007 at 07:23:49PM -0400, Theodore Tso wrote: There are a number of filesystem corruptions this algorithm won't catch. The most obvious is one where the directory tree isn't really a tree, but an cyclic graph.

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Nikita Danilov
Valerie Henson writes: [...] You're right about needing to read the equivalent data-structure - for other reasons, each continuation inode will need an easily accessible list of byte ranges covered by that inode. (Sounds like, hey, extents!) The important part is that you don't have

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Matt Mackall
On Wed, May 09, 2007 at 11:59:23AM -0700, Valerie Henson wrote: On Wed, May 09, 2007 at 12:06:52PM -0500, Matt Mackall wrote: On Wed, May 09, 2007 at 12:56:39AM -0700, Valerie Henson wrote: On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: This does mean that our time to

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-09 Thread Matt Mackall
On Wed, May 09, 2007 at 12:01:13PM -0700, Valerie Henson wrote: On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: On Sun, Apr 29, 2007 at 07:23:49PM -0400, Theodore Tso wrote: There are a number of filesystem corruptions this algorithm won't catch. The most obvious is one

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-02 Thread Jörn Engel
On Mon, 30 April 2007 12:59:26 -0500, Matt Mackall wrote: We could eliminate the block bitmap, but I don't think there's much reason to. It improves allocator performance with negligible footprint and improves redundancy. LogFS uses that scheme and it costs dearly. Walking the rmap to

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-02 Thread Jörn Engel
On Sun, 29 April 2007 20:40:42 -0500, Matt Mackall wrote: So we should have no trouble checking an exabyte-sized filesystem on a 4MB box. Even if it has one exabyte-sized file! We check the first tile, see that it points to our file, then iterate through that file, checking that the forward

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-02 Thread Matt Mackall
On Wed, May 02, 2007 at 03:32:05PM +0200, Jörn Engel wrote: On Sun, 29 April 2007 20:40:42 -0500, Matt Mackall wrote: So we should have no trouble checking an exabyte-sized filesystem on a 4MB box. Even if it has one exabyte-sized file! We check the first tile, see that it points to our

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-05-02 Thread Jörn Engel
On Wed, 2 May 2007 10:37:38 -0500, Matt Mackall wrote: fpos does allow us to check just a subset of the file efficiently, yes. And that things are more strictly 1:1, because it unambiguously matches a single forward pointer in the file. Ok, I'm warming to the idea. But indirect blocks

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-30 Thread Theodore Tso
On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: chunkfs. The other is reverse maps (aka back pointers) for blocks - inodes and inodes - directories that obviate the need to have large amounts of memory to check for collisions. Yes, I missed the fact that you had back pointers for

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-30 Thread Matt Mackall
On Mon, Apr 30, 2007 at 01:26:24PM -0400, Theodore Tso wrote: On Sun, Apr 29, 2007 at 08:40:42PM -0500, Matt Mackall wrote: chunkfs. The other is reverse maps (aka back pointers) for blocks - inodes and inodes - directories that obviate the need to have large amounts of memory to check for

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Jörn Engel
On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space and CPU overhead may also be very small, while greatly improving

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Matt Mackall
On Sun, Apr 29, 2007 at 02:21:13PM +0200, Jörn Engel wrote: On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Andi Kleen
Matt Mackall [EMAIL PROTECTED] writes: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space and CPU overhead may also be very small, while greatly improving filesystem

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Jörn Engel
On Sun, 29 April 2007 07:57:18 -0500, Matt Mackall wrote: On Sun, Apr 29, 2007 at 02:21:13PM +0200, Jörn Engel wrote: Thanks. I think this is a bit more direct solution than ChunkFS, but a) I haven't followed ChunkFS closely and b) I haven't been thinking about fsck very long, so this is

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Jörn Engel
On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: Some things we need to check during fsck: all directories point to in-use inodes all in-use inodes are referred to by directories all inodes in use are marked in use all free inodes are marked free all inodes point to in-use

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Jörn Engel
On Sun, 29 April 2007 18:34:59 +0200, Andi Kleen wrote: Matt Mackall [EMAIL PROTECTED] writes: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space and CPU overhead may

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Matt Mackall
On Sun, Apr 29, 2007 at 05:58:48PM +0200, Jörn Engel wrote: On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: Some things we need to check during fsck: all directories point to in-use inodes all in-use inodes are referred to by directories all inodes in use are marked in

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Theodore Tso
On Sat, Apr 28, 2007 at 05:05:22PM -0500, Matt Mackall wrote: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk space and CPU overhead may also be very small, while greatly

Re: [RFC] TileFS - a proposal for scalable integrity checking

2007-04-29 Thread Matt Mackall
On Sun, Apr 29, 2007 at 07:23:49PM -0400, Theodore Tso wrote: On Sat, Apr 28, 2007 at 05:05:22PM -0500, Matt Mackall wrote: This is a relatively simple scheme for making a filesystem with incremental online consistency checks of both data and metadata. Overhead can be well under 1% disk