Re: [RFC] fsblock

2007-07-09 Thread Christoph Lameter
On Sun, 24 Jun 2007, Nick Piggin wrote: Firstly, what is the buffer layer? The buffer layer isn't really a buffer layer as in the buffer cache of unix: the block device cache is unified with the pagecache (in terms of the pagecache, a blkdev file is just like any other, but with a 1:1

Re: [RFC] fsblock

2007-07-09 Thread Nick Piggin
On Mon, Jul 09, 2007 at 10:14:06AM -0700, Christoph Lameter wrote: On Sun, 24 Jun 2007, Nick Piggin wrote: Firstly, what is the buffer layer? The buffer layer isn't really a buffer layer as in the buffer cache of unix: the block device cache is unified with the pagecache (in terms of the

Re: [RFC] fsblock

2007-07-09 Thread Christoph Lameter
On Tue, 10 Jul 2007, Nick Piggin wrote: Hmmm I did not notice that yet but then I have not done much work there. Notice what? The bad code for the buffer heads. - A real nobh mode. nobh was created I think mainly to avoid problems with buffer_head memory consumption,

Re: [RFC] fsblock

2007-07-09 Thread Nick Piggin
On Mon, Jul 09, 2007 at 05:59:47PM -0700, Christoph Lameter wrote: On Tue, 10 Jul 2007, Nick Piggin wrote: Hmmm I did not notice that yet but then I have not done much work there. Notice what? The bad code for the buffer heads. Oh. Well my first mail in this thrad listed

Re: [RFC] fsblock

2007-07-09 Thread Dave McCracken
On Monday 09 July 2007, Christoph Lameter wrote: On Tue, 10 Jul 2007, Nick Piggin wrote: There are no changes to the filesystem API for large pages (although I am adding a couple of helpers to do page based bitmap ops). And I don't want to rely on contiguous memory. Why do you think

Re: [RFC] fsblock

2007-06-30 Thread Christoph Hellwig
On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote: - In line with the above item, filesystem block allocation is performed before a page is dirtied. In the buffer layer, mmap writes can dirty a page with no backing blocks which is a problem if the filesystem is ENOSPC (patches

Re: [RFC] fsblock

2007-06-30 Thread Christoph Hellwig
On Mon, Jun 25, 2007 at 08:25:21AM -0400, Chris Mason wrote: write_begin/write_end is a step in that direction (and it helps OCFS and GFS quite a bit). I think there is also not much reason for writepage sites to require the page to lock the page and clear the dirty bit themselves (which

Re: [RFC] fsblock

2007-06-30 Thread Christoph Hellwig
Warning ahead: I've only briefly skipped over the pages so the comments in the mail are very highlevel. On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: fsblock is a rewrite of the buffer layer (ding dong the witch is dead), which I have been working on, on and off and is now at

Re: [RFC] fsblock

2007-06-30 Thread Jeff Garzik
Christoph Hellwig wrote: On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote: - In line with the above item, filesystem block allocation is performed before a page is dirtied. In the buffer layer, mmap writes can dirty a page with no backing blocks which is a problem if the filesystem

Re: [RFC] fsblock

2007-06-30 Thread Christoph Hellwig
On Sat, Jun 30, 2007 at 07:10:27AM -0400, Jeff Garzik wrote: Not really, the current behaviour is a bug. And it's not actually buffer layer specific - XFS now has a fix for that bug and it's generic enough that everyone could use it. I'm not sure I follow. If you require block allocation

Re: [RFC] fsblock

2007-06-28 Thread Chris Mason
On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote: On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote: On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote: Lets look at a typical example of how IO actually gets done today, starting with sys_write():

Re: [RFC] fsblock

2007-06-28 Thread Nick Piggin
On Thu, Jun 28, 2007 at 08:20:31AM -0400, Chris Mason wrote: On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote: That's true but I don't think an extent data structure means we can become too far divorced from the pagecache or the native block size -- what will end up happening

Re: [RFC] fsblock

2007-06-27 Thread David Chinner
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: I think using fsblock to drive the IO and keep the pagecache flags uptodate and using a btree in the filesystem to manage extents of block allocations wouldn't be a bad idea though. Do any filesystems actually do this? Yes. XFS. But

Re: [RFC] fsblock

2007-06-27 Thread Chris Mason
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping ]

Re: [RFC] fsblock

2007-06-27 Thread Kyle Moffett
On Jun 26, 2007, at 07:14:14, Nick Piggin wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: Can we call it a block mapping layer or something like that? e.g. struct blkmap? I'm not fixed on fsblock, but blkmap doesn't grab me either. It is a map from the pagecache to

Re: [RFC] fsblock

2007-06-27 Thread Anton Altaparmakov
On 27 Jun 2007, at 12:50, Chris Mason wrote: On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:

Re: [RFC] fsblock

2007-06-27 Thread Nick Piggin
On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote: On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote: Lets look at a typical example of how IO actually gets done today, starting with sys_write(): sys_write(file, buffer, 1MB) for each page: prepare_write()

Re: [RFC] fsblock

2007-06-26 Thread David Chinner
On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: David Chinner wrote: On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: I'm announcing fsblock now because it is quite intrusive and so I'd like to get some thoughts about significantly changing this core part of the

Re: [RFC] fsblock

2007-06-26 Thread Nick Piggin
On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: Realistically, this is not about filesystem blocks, this is about file offset to disk blocks. i.e. it's a mapping. Yeah, fsblock ~= the layer between the fs and

Re: [RFC] fsblock

2007-06-26 Thread Chris Mason
On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping ] iomaps can double as range locks simply because iomaps are expressions of ranges within the file. Seeing as you can only

Re: [RFC] fsblock

2007-06-26 Thread Nick Piggin
On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping ] iomaps can double as range locks simply because iomaps are

Re: [RFC] fsblock

2007-06-25 Thread Nick Piggin
Andi Kleen wrote: Nick Piggin [EMAIL PROTECTED] writes: - Structure packing. A page gets a number of buffer heads that are allocated in a linked list. fsblocks are allocated contiguously, so cacheline footprint is smaller in the above situation. It would be interesting to test if that

Re: [RFC] fsblock

2007-06-25 Thread Nick Piggin
Chris Mason wrote: On Sun, Jun 24, 2007 at 05:47:55AM +0200, Nick Piggin wrote: My gut feeling is that there are several problem areas you haven't hit yet, with the new code. I would agree with your gut :) Without having read the code yet (light reading for monday morning ;), ext3 and

Re: [RFC] fsblock

2007-06-25 Thread Chris Mason
On Mon, Jun 25, 2007 at 04:58:48PM +1000, Nick Piggin wrote: Using buffer heads instead allows the FS to send file data down inside the transaction code, without taking the page lock. So, locking wrt data=ordered is definitely going to be tricky. The best long term option may be making

Re: [RFC] fsblock

2007-06-25 Thread Nick Piggin
David Chinner wrote: On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: I'm announcing fsblock now because it is quite intrusive and so I'd like to get some thoughts about significantly changing this core part of the kernel. Can you rename it to something other than shorthand for

Re: [RFC] fsblock

2007-06-24 Thread Andi Kleen
Nick Piggin [EMAIL PROTECTED] writes: - Structure packing. A page gets a number of buffer heads that are allocated in a linked list. fsblocks are allocated contiguously, so cacheline footprint is smaller in the above situation. It would be interesting to test if that makes a difference

[RFC] fsblock

2007-06-23 Thread Nick Piggin
I'm announcing fsblock now because it is quite intrusive and so I'd like to get some thoughts about significantly changing this core part of the kernel. fsblock is a rewrite of the buffer layer (ding dong the witch is dead), which I have been working on, on and off and is now at the stage where

Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
Just clarify a few things. Don't you hate rereading a long work you wrote? (oh, you're supposed to do that *before* you press send?). On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: I'm announcing fsblock now because it is quite intrusive and so I'd like to get some thoughts

Re: [RFC] fsblock

2007-06-23 Thread Jeff Garzik
Nick Piggin wrote: - No deadlocks (hopefully). The buffer layer is technically deadlocky by design, because it can require memory allocations at page writeout-time. It also has one path that cannot tolerate memory allocation failures. No such problems for fsblock, which keeps fsblock

Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote: Nick Piggin wrote: - No deadlocks (hopefully). The buffer layer is technically deadlocky by design, because it can require memory allocations at page writeout-time. It also has one path that cannot tolerate memory allocation

Re: [RFC] fsblock

2007-06-23 Thread William Lee Irwin III
On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: fsblock is a rewrite of the buffer layer (ding dong the witch is dead), which I have been working on, on and off and is now at the stage where some of the basics are working-ish. This email is going to be long... Long overdue. Thank