RE: [00/17] Large Blocksize Support V3

2007-05-09 Thread Weigert, Daniel
rty; Maxim Levitsky Subject: Re: [00/17] Large Blocksize Support V3 David Chinner <[EMAIL PROTECTED]> writes: > Both. To many things can happen asynchroonously to a page that it > makes it just about impossible to predict all the potential race > conditions that are inv

RE: [00/17] Large Blocksize Support V3

2007-05-09 Thread Weigert, Daniel
Subject: Re: [00/17] Large Blocksize Support V3 David Chinner [EMAIL PROTECTED] writes: Both. To many things can happen asynchroonously to a page that it makes it just about impossible to predict all the potential race conditions that are involved. complexity arose from trying to fix

Re: [00/17] Large Blocksize Support V3

2007-05-08 Thread William Lee Irwin III
On Mon, May 07, 2007 at 12:06:38AM -0700, William Lee Irwin III wrote: > +int alloc_page_array(struct pagearray *, const int, const size_t); > +void free_page_array(struct pagearray *); > +void zero_page_array(struct pagearray *); > +struct page *nopage_page_array(const struct vm_area_struct *,

Re: [00/17] Large Blocksize Support V3

2007-05-08 Thread William Lee Irwin III
On Mon, May 07, 2007 at 12:06:38AM -0700, William Lee Irwin III wrote: +int alloc_page_array(struct pagearray *, const int, const size_t); +void free_page_array(struct pagearray *); +void zero_page_array(struct pagearray *); +struct page *nopage_page_array(const struct vm_area_struct *,

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
On Mon, 7 May 2007, Eric W. Biederman wrote: >> Yes, instead of having to redesign the interface between the >> fs and the page cache for those filesystems that handle large >> blocks we instead need to redesign significant parts of the VM interface. >> Shift the redesign work to another group of

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Eric W. Biederman wrote: > Yes, instead of having to redesign the interface between the > fs and the page cache for those filesystems that handle large > blocks we instead need to redesign significant parts of the VM interface. > Shift the redesign work to another group of

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
David Chinner <[EMAIL PROTECTED]> writes: >>> Right - so how do we efficiently manipulate data inside a large >>> block that spans multiple discontigous pages if we don't vmap >>> it? On Mon, May 07, 2007 at 12:43:19AM -0600, Eric W. Biederman wrote: >> You don't manipulate data except for

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Eric W. Biederman
David Chinner <[EMAIL PROTECTED]> writes: > Both. To many things can happen asynchroonously to a page that it > makes it just about impossible to predict all the potential race > conditions that are involved. complexity arose from trying to fix > the races that were uncovered without breaking

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
David Chinner <[EMAIL PROTECTED]> writes: >> Right - so how do we efficiently manipulate data inside a large >> block that spans multiple discontigous pages if we don't vmap >> it? On Mon, May 07, 2007 at 12:43:19AM -0600, Eric W. Biederman wrote: > You don't manipulate data except for

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Eric W. Biederman
David Chinner <[EMAIL PROTECTED]> writes: > On Sun, May 06, 2007 at 10:48:23PM -0600, Eric W. Biederman wrote: >> David Chinner <[EMAIL PROTECTED]> writes: >> >> > On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: >> >> > >> >> > So while the jury is out about how many other

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Eric W. Biederman wrote: Yes, instead of having to redesign the interface between the fs and the page cache for those filesystems that handle large blocks we instead need to redesign significant parts of the VM interface. Shift the redesign work to another group of people

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
On Mon, 7 May 2007, Eric W. Biederman wrote: Yes, instead of having to redesign the interface between the fs and the page cache for those filesystems that handle large blocks we instead need to redesign significant parts of the VM interface. Shift the redesign work to another group of people

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Eric W. Biederman
David Chinner [EMAIL PROTECTED] writes: On Sun, May 06, 2007 at 10:48:23PM -0600, Eric W. Biederman wrote: David Chinner [EMAIL PROTECTED] writes: On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: So while the jury is out about how many other filesystems might use

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
David Chinner [EMAIL PROTECTED] writes: Right - so how do we efficiently manipulate data inside a large block that spans multiple discontigous pages if we don't vmap it? On Mon, May 07, 2007 at 12:43:19AM -0600, Eric W. Biederman wrote: You don't manipulate data except for copy_from_user,

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread Eric W. Biederman
David Chinner [EMAIL PROTECTED] writes: Both. To many things can happen asynchroonously to a page that it makes it just about impossible to predict all the potential race conditions that are involved. complexity arose from trying to fix the races that were uncovered without breaking

Re: [00/17] Large Blocksize Support V3

2007-05-07 Thread William Lee Irwin III
David Chinner [EMAIL PROTECTED] writes: Right - so how do we efficiently manipulate data inside a large block that spans multiple discontigous pages if we don't vmap it? On Mon, May 07, 2007 at 12:43:19AM -0600, Eric W. Biederman wrote: You don't manipulate data except for copy_from_user,

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Sun, May 06, 2007 at 10:48:23PM -0600, Eric W. Biederman wrote: > David Chinner <[EMAIL PROTECTED]> writes: > > > On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: > >> > > >> > So while the jury is out about how many other filesystems might use > >> > it, I suspect it's more

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Fri, May 04, 2007 at 07:31:37AM -0600, Eric W. Biederman wrote: > David Chinner <[EMAIL PROTECTED]> writes: > > > On Fri, Apr 27, 2007 at 12:04:03AM -0700, Andrew Morton wrote: > > I've got several year-old Irix bugs assigned that are hit every so > > often where one page in the aggregated

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread Eric W. Biederman
David Chinner <[EMAIL PROTECTED]> writes: > On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: >> > >> > So while the jury is out about how many other filesystems might use >> > it, I suspect it's more than you might think. At the very least, >> > there may be some IA64 users who

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: > > > > So while the jury is out about how many other filesystems might use > > it, I suspect it's more than you might think. At the very least, > > there may be some IA64 users who might be trying to transition their > > way to

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: So while the jury is out about how many other filesystems might use it, I suspect it's more than you might think. At the very least, there may be some IA64 users who might be trying to transition their way to x86_64, and

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread Eric W. Biederman
David Chinner [EMAIL PROTECTED] writes: On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: So while the jury is out about how many other filesystems might use it, I suspect it's more than you might think. At the very least, there may be some IA64 users who might be

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Fri, May 04, 2007 at 07:31:37AM -0600, Eric W. Biederman wrote: David Chinner [EMAIL PROTECTED] writes: On Fri, Apr 27, 2007 at 12:04:03AM -0700, Andrew Morton wrote: I've got several year-old Irix bugs assigned that are hit every so often where one page in the aggregated set has the

Re: [00/17] Large Blocksize Support V3

2007-05-06 Thread David Chinner
On Sun, May 06, 2007 at 10:48:23PM -0600, Eric W. Biederman wrote: David Chinner [EMAIL PROTECTED] writes: On Fri, May 04, 2007 at 07:33:54AM -0600, Eric W. Biederman wrote: So while the jury is out about how many other filesystems might use it, I suspect it's more than you might

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Eric W. Biederman wrote: > Given that small block sizes give us better storage efficiency, > which means less disk bandwidth used, which means less time > to get the data off of a slow disk (especially if you can > put multiple files you want simultaneously in that same

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
Theodore Tso <[EMAIL PROTECTED]> writes: > On Fri, Apr 27, 2007 at 01:48:49AM -0700, Andrew Morton wrote: >> And other filesystems (ie: ext4) _might_ use it. But ext4 is extent-based, >> so perhaps it's not work churning the on-disk format to get a bit of a >> boost in the block allocator. > >

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
David Chinner <[EMAIL PROTECTED]> writes: > On Fri, Apr 27, 2007 at 12:04:03AM -0700, Andrew Morton wrote: > > I've looked at all this but I'm trying to work out if anyone > else has looked at the impact of doing this. I have direct experience > with this form of block aggregation - this is

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
Andrew Morton <[EMAIL PROTECTED]> writes: > On Fri, 27 Apr 2007 18:03:21 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > >> > > > > You basically have to >> > > > > jump through nasty, nasty hoops, to handle corner cases that are > introduced >> > > > > because the generic code can no longer

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes: On Fri, 27 Apr 2007 18:03:21 +1000 David Chinner [EMAIL PROTECTED] wrote: You basically have to jump through nasty, nasty hoops, to handle corner cases that are introduced because the generic code can no longer reliably lock out access to

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
David Chinner [EMAIL PROTECTED] writes: On Fri, Apr 27, 2007 at 12:04:03AM -0700, Andrew Morton wrote: I've looked at all this but I'm trying to work out if anyone else has looked at the impact of doing this. I have direct experience with this form of block aggregation - this is pretty much

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Eric W. Biederman
Theodore Tso [EMAIL PROTECTED] writes: On Fri, Apr 27, 2007 at 01:48:49AM -0700, Andrew Morton wrote: And other filesystems (ie: ext4) _might_ use it. But ext4 is extent-based, so perhaps it's not work churning the on-disk format to get a bit of a boost in the block allocator. Well, ext3

Re: [00/17] Large Blocksize Support V3

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Eric W. Biederman wrote: Given that small block sizes give us better storage efficiency, which means less disk bandwidth used, which means less time to get the data off of a slow disk (especially if you can put multiple files you want simultaneously in that same space).

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Christoph Lameter
On Fri, 27 Apr 2007, Andrew Morton wrote: > By misunderstanding any suggestions, misrepresenting them, making incorrect > statements about them, by not suggesting any alternatives yourself, all of > it buttressed by a stolid refusal to recognise that this patch has any > costs. That was even

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Christoph Lameter
On Sat, 28 Apr 2007, Maxim Levitsky wrote: > 1) Is it possible for block device to assume that it will alway get big > requests (and aligned by big blocksize) ? That is one of the key problems. We hope that Mel Gorman's antifrag work will get us there. > 2) Does metadata reading/writing

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Matt Mackall
On Thu, Apr 26, 2007 at 02:28:46PM +0100, Alan Cox wrote: > > > Oh we have scores of these hacks around. Look at the dvd/cd layer. The > > > point is to get rid of those. > > > > Perhaps this is just a matter of cleaning them up so they are no > > longer hacks? > > CD and DVD media support

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Peter Zijlstra
On Sat, 2007-04-28 at 01:55 -0700, Andrew Morton wrote: > On Sat, 28 Apr 2007 10:32:56 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: > > > On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> > > > wrote: > > > > >

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Peter Zijlstra
On Sat, 2007-04-28 at 01:55 -0700, Andrew Morton wrote: On Sat, 28 Apr 2007 10:32:56 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The other

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Matt Mackall
On Thu, Apr 26, 2007 at 02:28:46PM +0100, Alan Cox wrote: Oh we have scores of these hacks around. Look at the dvd/cd layer. The point is to get rid of those. Perhaps this is just a matter of cleaning them up so they are no longer hacks? CD and DVD media support various non

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Christoph Lameter
On Sat, 28 Apr 2007, Maxim Levitsky wrote: 1) Is it possible for block device to assume that it will alway get big requests (and aligned by big blocksize) ? That is one of the key problems. We hope that Mel Gorman's antifrag work will get us there. 2) Does metadata reading/writing

Re: [00/17] Large Blocksize Support V3

2007-04-29 Thread Christoph Lameter
On Fri, 27 Apr 2007, Andrew Morton wrote: By misunderstanding any suggestions, misrepresenting them, making incorrect statements about them, by not suggesting any alternatives yourself, all of it buttressed by a stolid refusal to recognise that this patch has any costs. That was even

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 12:19:56 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: > I'm skeptical, however, that the contiguity gains will compensate for > the CPU required to do such with the pcp lists. It wouldn't surprise me if approximate contiguity is a pretty common case in the pcp

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: >> The gang allocation affair would may also want to make the calls into >> the page allocator batched. For instance, grab enough compound pages to >> build the gang under the lock, since we're going to blow the

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: > On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > >> only 4.4 times faster, and more scalable, since we don't bounce the > >> upper level locks around. > > On Sat, Apr 28, 2007 at

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Maxim Levitsky
On Wednesday 25 April 2007 01:21, [EMAIL PROTECTED] wrote: > Rationales: > > 1. We have problems supporting devices with a higher blocksize than >page size. This is for example important to support CD and DVDs that >can only read and write 32k or 64k blocks. We currently have a shim >

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Eric W. Biederman
Pierre Ossman <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> >> I have a hard time believe that device hardware limits don't allow them >> to have enough space to handle larger requests. If so it was a poor >> design by the hardware manufacturers. >> > > In the MMC layer, the block

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, Apr 28, 2007 at 12:29:08PM +0100, Alan Cox wrote: > Not neccessarily. If you use 16K contiguous pages you have to do > more work to get memory contiguously and you have less cache efficiency > both of which will do serious damage to performance with poor I/O > subsystems for all the extra

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: >> only 4.4 times faster, and more scalable, since we don't bounce the >> upper level locks around. On Sat, Apr 28, 2007 at 01:22:51AM -0700, Andrew Morton wrote: > I'm not sure what we're looking at here. radix-tree

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
> But all (both) the proposals we're (ahem) discussing do involve 4x > physically contiguous pages going into those four contiguous pagecache > slots. > > So we're improving things for the half-assed controllers, aren't we? Not neccessarily. If you use 16K contiguous pages you have to do more

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Pierre Ossman
Eric W. Biederman wrote: > > I have a hard time believe that device hardware limits don't allow them > to have enough space to handle larger requests. If so it was a poor > design by the hardware manufacturers. > In the MMC layer, the block size is a major bottle neck. None of the currently

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 11:21:17 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > > > Also remember that even if you do larger pages by using virtual pairs or > > > quads of real pages because it helps on some systems you end up needing > > > the same sized sglist as before so you don't make anything

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
> > Also remember that even if you do larger pages by using virtual pairs or > > quads of real pages because it helps on some systems you end up needing > > the same sized sglist as before so you don't make anything worse for > > half-assed controllers as you get the same I/O size providing they

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:43:28 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > On Fri, 27 Apr 2007 21:56:34 -0700 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > > > > > Fix up your lameo HBA for reads. > > > > > >

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
On Fri, 27 Apr 2007 21:56:34 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > > > Fix up your lameo HBA for reads. > > > > Where did that come from? You spend 20 lines described the inefficiencies > > of the

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:32:56 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: > > On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > > > > > > > The other thing is that we can batch up pagecache page

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Peter Zijlstra
On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: > On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > > > > The other thing is that we can batch up pagecache page insertions for bulk > > > writes as well (that is. write(2) with buffer size > page size). I

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > The other thing is that we can batch up pagecache page insertions for bulk > > writes as well (that is. write(2) with buffer size > page size). I should > > have a patch somewhere for that as well if anyone

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Christoph Hellwig
On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: > And that wasn't due to the 128 sg limit? No, that was due to aacraid really liking sg lists as small as possible where every entry covers areas as big as possible. The driver really liked physical merging once wli changed the page

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Peter Zijlstra
On Sat, 2007-04-28 at 11:43 +1000, Nick Piggin wrote: > Andrew Morton wrote: > > For example, see __do_page_cache_readahead(). It does a read_lock() and a > > page allocation and a radix-tree lookup for each page. We can vastly > > improve that. > > > > Step 1: > > > > - do a read-lock > > >

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Fri, 27 Apr 2007 23:24:05 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > > Fact is, this change has *costs*. And you're completely ignoring them, > > trying to spin them away. It ain't working and it never will. I'm seeing > > no serious attempt to think about how we can reduce

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Christoph Lameter
On Fri, 27 Apr 2007, Andrew Morton wrote: > Your patch *is* a workaround. It's a workaround for small CPU pagesize. > It's a workaround for suboptimal VFS anf filesystem implementations. It's > a workaround for a disk adapter which has suboptimal readahead and > writeback caching

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Christoph Lameter
On Fri, 27 Apr 2007, Andrew Morton wrote: Your patch *is* a workaround. It's a workaround for small CPU pagesize. It's a workaround for suboptimal VFS anf filesystem implementations. It's a workaround for a disk adapter which has suboptimal readahead and writeback caching implementations.

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Fri, 27 Apr 2007 23:24:05 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: Fact is, this change has *costs*. And you're completely ignoring them, trying to spin them away. It ain't working and it never will. I'm seeing no serious attempt to think about how we can reduce those

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Peter Zijlstra
On Sat, 2007-04-28 at 11:43 +1000, Nick Piggin wrote: Andrew Morton wrote: For example, see __do_page_cache_readahead(). It does a read_lock() and a page allocation and a radix-tree lookup for each page. We can vastly improve that. Step 1: - do a read-lock - do a

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Christoph Hellwig
On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: And that wasn't due to the 128 sg limit? No, that was due to aacraid really liking sg lists as small as possible where every entry covers areas as big as possible. The driver really liked physical merging once wli changed the page

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The other thing is that we can batch up pagecache page insertions for bulk writes as well (that is. write(2) with buffer size page size). I should have a patch somewhere for that as well if anyone interested.

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Peter Zijlstra
On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The other thing is that we can batch up pagecache page insertions for bulk writes as well (that is. write(2) with buffer size page size). I should

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:32:56 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Sat, 2007-04-28 at 01:22 -0700, Andrew Morton wrote: On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The other thing is that we can batch up pagecache page insertions for

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
On Fri, 27 Apr 2007 21:56:34 -0700 Andrew Morton [EMAIL PROTECTED] wrote: On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner [EMAIL PROTECTED] wrote: Fix up your lameo HBA for reads. Where did that come from? You spend 20 lines described the inefficiencies of the readahead in the page

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 10:43:28 +0100 Alan Cox [EMAIL PROTECTED] wrote: On Fri, 27 Apr 2007 21:56:34 -0700 Andrew Morton [EMAIL PROTECTED] wrote: On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner [EMAIL PROTECTED] wrote: Fix up your lameo HBA for reads. Where did that come from?

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
Also remember that even if you do larger pages by using virtual pairs or quads of real pages because it helps on some systems you end up needing the same sized sglist as before so you don't make anything worse for half-assed controllers as you get the same I/O size providing they have

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 11:21:17 +0100 Alan Cox [EMAIL PROTECTED] wrote: Also remember that even if you do larger pages by using virtual pairs or quads of real pages because it helps on some systems you end up needing the same sized sglist as before so you don't make anything worse for

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Pierre Ossman
Eric W. Biederman wrote: I have a hard time believe that device hardware limits don't allow them to have enough space to handle larger requests. If so it was a poor design by the hardware manufacturers. In the MMC layer, the block size is a major bottle neck. None of the currently

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Alan Cox
But all (both) the proposals we're (ahem) discussing do involve 4x physically contiguous pages going into those four contiguous pagecache slots. So we're improving things for the half-assed controllers, aren't we? Not neccessarily. If you use 16K contiguous pages you have to do more work to

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: only 4.4 times faster, and more scalable, since we don't bounce the upper level locks around. On Sat, Apr 28, 2007 at 01:22:51AM -0700, Andrew Morton wrote: I'm not sure what we're looking at here. radix-tree changes?

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, Apr 28, 2007 at 12:29:08PM +0100, Alan Cox wrote: Not neccessarily. If you use 16K contiguous pages you have to do more work to get memory contiguously and you have less cache efficiency both of which will do serious damage to performance with poor I/O subsystems for all the extra

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Eric W. Biederman
Pierre Ossman [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I have a hard time believe that device hardware limits don't allow them to have enough space to handle larger requests. If so it was a poor design by the hardware manufacturers. In the MMC layer, the block size is a major

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Maxim Levitsky
On Wednesday 25 April 2007 01:21, [EMAIL PROTECTED] wrote: Rationales: 1. We have problems supporting devices with a higher blocksize than page size. This is for example important to support CD and DVDs that can only read and write 32k or 64k blocks. We currently have a shim layer

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III [EMAIL PROTECTED] wrote: On Sat, 28 Apr 2007 10:04:08 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: only 4.4 times faster, and more scalable, since we don't bounce the upper level locks around. On Sat, Apr 28, 2007 at 01:22:51AM

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread William Lee Irwin III
On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III [EMAIL PROTECTED] wrote: The gang allocation affair would may also want to make the calls into the page allocator batched. For instance, grab enough compound pages to build the gang under the lock, since we're going to blow the per-cpu

Re: [00/17] Large Blocksize Support V3

2007-04-28 Thread Andrew Morton
On Sat, 28 Apr 2007 12:19:56 -0700 William Lee Irwin III [EMAIL PROTECTED] wrote: I'm skeptical, however, that the contiguity gains will compensate for the CPU required to do such with the pcp lists. It wouldn't surprise me if approximate contiguity is a pretty common case in the pcp lists.

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Andrew Morton
On Fri, 27 Apr 2007 22:08:17 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 27 Apr 2007, Andrew Morton wrote: > > > My (repeated) point is that if we populate pagecache with > > physically-contiguous 4k > > pages in this manner then bio+block will be able to create much

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Christoph Lameter
On Fri, 27 Apr 2007, Andrew Morton wrote: > My (repeated) point is that if we populate pagecache with > physically-contiguous 4k > pages in this manner then bio+block will be able to create much larger SG > lists. True but the "if" becomes exceedingly rare the longer the system was in

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Andrew Morton
On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > Fix up your lameo HBA for reads. > > Where did that come from? You spend 20 lines described the inefficiencies > of the readahead in the page cache and it should be fixed but then you > turn around and say fix the

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Christoph Lameter
On Sat, 28 Apr 2007, David Chinner wrote: > > 1-disk and 2-disk read throughput fell by an improbable amount, which makes > > me cautious about the other numbers. > > For read, yes, and it's because something is going wrong with the > I/O size - it looks like readahead thrashing of some kind

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread David Chinner
On Fri, Apr 27, 2007 at 12:11:08PM -0700, Andrew Morton wrote: > On Sat, 28 Apr 2007 03:34:32 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > > Some more information - stripe unit on the dm raid0 is 512k. > > I have not attempted to increase I/O sizes at all yet - these test are > > just

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread William Lee Irwin III
William Lee Irwin III wrote: >> What sort of strategy do you intend to use to speculatively populate >> the pagecache with contiguous pages? On Sat, Apr 28, 2007 at 12:50:26PM +1000, Nick Piggin wrote: > Andrew outlined it. I'd like to suggest a few straightforward additions to the proposal:

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nick Piggin
William Lee Irwin III wrote: On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: I guess 10% isn't a small amount. Though it would be nice to have before/after numbers for Linux. And, like Andrew was saying, we could just _attempt_ to put contiguous pages in pagecache rather than

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread William Lee Irwin III
On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: > I guess 10% isn't a small amount. Though it would be nice to have > before/after numbers for Linux. And, like Andrew was saying, we could > just _attempt_ to put contiguous pages in pagecache rather than > _require_ it. Which is still

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nick Piggin
Christoph Hellwig wrote: On Fri, Apr 27, 2007 at 10:25:44PM +1000, Nick Piggin wrote: Linus's favourite jokes about powerpc mmu being crippled forever, aside ;) Different mmu. The desktop 32bit mmu Linus refered to has almost nothing in common with the mmu on 64bit systems. Well I

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread William Lee Irwin III
On Thu, Apr 26, 2007 at 11:55:42PM -0700, Andrew Morton wrote: >>> Please address my point: if in five years time x86 has larger or varible >>> pagesize, this code will be a permanent millstone around our necks which we >>> *should not have merged*. >>> And if in five years time x86 does not have

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nick Piggin
Andrew Morton wrote: On Sat, 28 Apr 2007 03:34:32 +1000 David Chinner <[EMAIL PROTECTED]> wrote: Some more information - stripe unit on the dm raid0 is 512k. I have not attempted to increase I/O sizes at all yet - these test are just demonstrating efficiency improvements in the filesystem.

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Andrew Morton
On Fri, 27 Apr 2007 06:44:51 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: > On Thu, Apr 26, 2007 at 11:55:42PM -0700, Andrew Morton wrote: > > Please address my point: if in five years time x86 has larger or varible > > pagesize, this code will be a permanent millstone around our necks

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Andrew Morton
On Sat, 28 Apr 2007 03:34:32 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > Some more information - stripe unit on the dm raid0 is 512k. > I have not attempted to increase I/O sizes at all yet - these test are > just demonstrating efficiency improvements in the filesystem. > > These numbers

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread William Lee Irwin III
On Fri, 2007-04-27 at 12:55 -0400, Theodore Tso wrote: >> Unfortunately, this isn't a problem with hardware getting better, but >> a willingness to break backwards compatibility. >> x86_64 uses a 4k page size to avoid breaking 32-bit applications. And >> unfortunately, iirc, even 64-bit

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread David Chinner
On Sat, Apr 28, 2007 at 02:36:20AM +1000, David Chinner wrote: > The test was writing a single 50GB file to a fresh filesystem, and > then reading it back. Run on two different dm stripes - a 4-disk > RAID) and a 8disk RAID0 stripe, with a stripe unit of 512k. Disks > are 10krpm SAS, external

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nicholas Miell
On Fri, 2007-04-27 at 12:55 -0400, Theodore Tso wrote: > On Thu, Apr 26, 2007 at 10:15:28PM -0700, Andrew Morton wrote: > > And hardware gets better. If Intel & AMD come out with a 16k pagesize > > option in a couple of years we'll look pretty dumb. If the problems which > > you're presently

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Theodore Tso
On Thu, Apr 26, 2007 at 10:15:28PM -0700, Andrew Morton wrote: > And hardware gets better. If Intel & AMD come out with a 16k pagesize > option in a couple of years we'll look pretty dumb. If the problems which > you're presently having with that controller get sorted out in the next >

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Christoph Lameter
On Fri, 27 Apr 2007, Nick Piggin wrote: > Linus's favourite jokes about powerpc mmu being crippled forever, aside ;) > > This seems like just speculation. I would not be against something which, > without, would "cripple" some relevant hardware, but you are just handwaving > at this point. And

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Theodore Tso
On Fri, Apr 27, 2007 at 01:48:49AM -0700, Andrew Morton wrote: > And other filesystems (ie: ext4) _might_ use it. But ext4 is extent-based, > so perhaps it's not work churning the on-disk format to get a bit of a > boost in the block allocator. Well, ext3 could definitely use it; there are

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread David Chinner
On Fri, Apr 27, 2007 at 12:26:40AM -0700, Andrew Morton wrote: > On Fri, 27 Apr 2007 00:19:49 -0700 (PDT) Christoph Lameter <[EMAIL > PROTECTED]> wrote: > > > The page cache handling in the various layers is significantly > > simplified which reduces maintenance cost. > > How on earth can the

  1   2   3   4   5   >