Re: [03/17] is_vmalloc_addr(): Check if an address is within the vmalloc boundaries

2007-09-18 Thread David Rientjes
On Tue, 18 Sep 2007, Christoph Lameter wrote: > Index: linux-2.6/include/linux/mm.h > === > --- linux-2.6.orig/include/linux/mm.h 2007-09-17 21:46:06.0 -0700 > +++ linux-2.6/include/linux/mm.h 2007-09-17 23:56:54.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread David Chinner
On Tue, Sep 18, 2007 at 06:06:52PM -0700, Linus Torvalds wrote: > > especially as the Linux > > kernel limitations in this area are well known. There's no "16K mess" > > that SGI is trying to clean up here (and SGI have offered both IA64 and > > x86_64

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/19/2007 06:33 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: I do feel larger blocksizes continue to make sense in general though. Packet writing on CD/DVD is a problem already today since the hardware needs 32K or 64K blocks and I'd expect to see more of these and si

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Rene Herman wrote: > > I do feel larger blocksizes continue to make sense in general though. Packet > writing on CD/DVD is a problem already today since the hardware needs 32K or > 64K blocks and I'd expect to see more of these and similiar situations when > flash gets (even

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/19/2007 05:50 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast machinery. Why would you ca

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/18/2007 09:44 PM, Linus Torvalds wrote: Nobody sane would *ever* argue for 16kB+ blocksizes in general. Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast ma

Re: [14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area

2007-09-18 Thread Gabriel C
Christoph Lameter wrote: > > + if (is_vmalloc_addr(word)) > + page = vmalloc_to_page(word) ^^ Missing ' ; ' > + else > + page = virt_to_page(word); > + > + zone = page_zone(page); > return &zone->wait_table[ha

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Rene Herman wrote: > > Well, not so sure about that. What if one of your expected uses for example is > video data storage -- lots of data, especially for multiple streams, and needs > still relatively fast machinery. Why would you care for the overhead af > _small_ blocks?

[07/17] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings

2007-09-18 Thread Christoph Lameter
This adds a new gfp flag __GFP_VFALLBACK If specified during a higher order allocation then the system will fall back to vmap and attempt to create a virtually contiguous area instead of a physically contiguous area. In many cases the virtually contiguous area can stand in for the physically cont

[06/17] vmalloc_address(): Determine vmalloc address from page struct

2007-09-18 Thread Christoph Lameter
Sometimes we need to figure out which vmalloc address is in use for a certain page struct. There is no easy way to figure out the vmalloc address from the page struct. So simply search through the kernel page table to find the address. This is a fairly expensive process. Use sparingly (or provide a

[05/17] vunmap: return page array

2007-09-18 Thread Christoph Lameter
Make vunmap return the page array that was used at vmap. This is useful if one has no structures to track the page array but simply stores the virtual address somewhere. The disposition of the page array can be decided upon after vunmap. vfree() may now also be used instead of vunmap which will rel

[09/17] VFALLBACK: Debugging aid

2007-09-18 Thread Christoph Lameter
Virtual fallbacks are rare and thus subtle bugs may creep in if we do not test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK allocations fall back to virtual mapping. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- lib/Kconfig.debug | 11 +++ mm/page_alloc.c

[10/17] Use GFP_VFALLBACK for sparsemem.

2007-09-18 Thread Christoph Lameter
Sparsemem currently attempts first to do a physically contiguous mapping and then falls back to vmalloc. The same thing can now be accomplished using GFP_VFALLBACK. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/sparse.c | 23 +++ 1 file changed, 3 insertions(+

[16/17] Allow virtual fallback for buffer_heads

2007-09-18 Thread Christoph Lameter
This is in particular useful for large I/Os because it will allow > 100 allocs from the SLUB fast path without having to go to the page allocator. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/buffer.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/fs/

[15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-18 Thread Christoph Lameter
SLAB_VFALLBACK can be specified for selected slab caches. If fallback is available then the conservative settings for higher order allocations are overridden. We then request an order that can accomodate at mininum 100 objects. The size of an individual slab allocation is allowed to reach up to 256

[13/17] Virtual compound page freeing in interrupt context

2007-09-18 Thread Christoph Lameter
If we are in an interrupt context then simply defer the free via a workqueue. In an interrupt context it is not possible to use vmalloc_addr() to determine the vmalloc address. So add a variant that does that too. Removing a virtual mappping *must* be done with interrupts enabled since tlb_xx fun

[14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area

2007-09-18 Thread Christoph Lameter
If bit waitqueue is passed a virtual address then it must use vmalloc_to_page instead of virt_to_page to get to the page struct. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- kernel/wait.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/wait.

[17/17] Allow virtual fallback for dentries

2007-09-18 Thread Christoph Lameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/dcache.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/fs/dcache.c === --- linux-2.6.orig/fs/dcache.c 2007-09-18 18:42:19.0 -0700 +++

[12/17] Virtual Compound page allocation from interrupt context.

2007-09-18 Thread Christoph Lameter
In an interrupt context we cannot wait for the vmlist_lock in __get_vm_area_node(). So use a trylock instead. If the trylock fails then the atomic allocation will fail and subsequently be retried. This only works because the flush_cache_vunmap in use for allocation is never performing any IPIs in

[08/17] Pass vmalloc address in page->private

2007-09-18 Thread Christoph Lameter
Avoid expensive lookups of virtual addresses from page structs by storing the vmalloc address in page->private. We can then avoid the vmalloc_address() in the get__page() functions and simply return page->private. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/page_alloc.c |

[11/17] GFP_VFALLBACK for zone wait table.

2007-09-18 Thread Christoph Lameter
Currently we have to use vmalloc for the zone wait table possibly generating the need to create lots of TLBs to access the tables. We can now use GFP_VFALLBACK to attempt the use of a physically contiguous page that can then use the large kernel TLBs. Signed-off-by: Christoph Lameter <[EMAIL PROTE

[01/17] Vmalloc: Move vmalloc_to_page to mm/vmalloc.

2007-09-18 Thread Christoph Lameter
We already have page table manipulation for vmalloc in vmalloc.c. Move the vmalloc_to_page() function there as well. Also move the related definitions from include/linux/mm.h. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/mm.h |2 -- include/linux/vmalloc.h |

[03/17] is_vmalloc_addr(): Check if an address is within the vmalloc boundaries

2007-09-18 Thread Christoph Lameter
This test is used in a couple of places. Add a version to vmalloc.h and replace the other checks. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/net/cxgb3/cxgb3_offload.c |4 +--- fs/ntfs/malloc.h |3 +-- fs/proc/kcore.c |2 +- fs/

[04/17] vmalloc: clean up page array indexing

2007-09-18 Thread Christoph Lameter
The page array is repeatedly indexed both in vunmap and vmalloc_area_node(). Add a temporary variable to make it easier to read (and easier to patch later). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/vmalloc.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletion

[00/17] [RFC] Virtual Compound Page Support

2007-09-18 Thread Christoph Lameter
Currently there is a strong tendency to avoid larger page allocations in the kernel because of past fragmentation issues and the current defragmentation methods are still evolving. It is not clear to what extend they can provide reliable allocations for higher order pages (plus the definition of "r

[02/17] Vmalloc: add const

2007-09-18 Thread Christoph Lameter
Make vmalloc functions work the same way as kfree() and friends that take a const void * argument. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/vmalloc.h | 10 +- mm/vmalloc.c| 16 2 files changed, 13 insertions(+), 13 deletions(

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 18:06 -0700, Linus Torvalds wrote: > There is *no* valid reason for 16kB blocksizes unless you have legacy > issues. That's not correct. > The performance issues have nothing to do with the block-size, and We must be thinking of different performance issues. > should be

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote: > JBD: Replace slab allocations with page cache allocations > > JBD allocate memory for committed_data and frozen_data from slab. However > JBD should not pass slab pages down to the block layer. Use page allocator > page

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 12:44 -0700, Linus Torvalds wrote: > This is not about performance. Never has been. It's about SGI wanting a > way out of their current 16kB mess. Pass the crack pipe, Linus? > The way to fix performance is to move to x86-64, and use 4kB pages and be > happy. However, the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Nathan Scott wrote: > > FWIW (and I hate to let reality get in the way of a good conspiracy) - > all SGI systems have always defaulted to using 4K blocksize filesystems; Yes. And I've been told that: > there's very few customers who would use larger .. who apparently woul

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote: > On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote: > > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote: > > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote: > > > > Here is the incremental small cleanup patch.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Christoph Lameter
On Tue, 18 Sep 2007, Nick Piggin wrote: > > We can avoid all doubt in this patchset as well by adding support for > > fallback to a vmalloced compound page. > > How would you do a vmapped fallback in your patchset? How would > you keep track of pages 2..N if they don't exist in the radix tree? T

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Christoph Lameter
On Tue, 18 Sep 2007, Nick Piggin wrote: > On Tuesday 18 September 2007 08:00, Christoph Lameter wrote: > > On Sun, 16 Sep 2007, Nick Piggin wrote: > > > I don't know how it would prevent fragmentation from building up > > > anyway. It's commonly the case that potentially unmovable objects > > > ar

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Tue, 18 Sep 2007, Andrea Arcangeli wrote: > > Many? I can't recall anything besides PF_MEMALLOC and the decision > that the VM is oom. *All* of the buddy bitmaps, *all* of the GPF_ATOMIC, *all* of the zone watermarks, everything that we depend on every single day, is in the end just about

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Andrea Arcangeli
On Mon, Sep 17, 2007 at 12:56:07AM +0200, Goswin von Brederlow wrote: > When has free ever given any usefull "free" number? I can perfectly > fine allocate another gigabyte of memory despide free saing 25MB. But > that is because I know that the buffer/cached are not locked in. Well, as you said y

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Andrea Arcangeli
On Tue, Sep 18, 2007 at 11:30:17AM -0700, Linus Torvalds wrote: > The fact is, *none* of those things are true. The VM doesn't guarantee > anything, and is already very much about statistics in many places. You Many? I can't recall anything besides PF_MEMALLOC and the decision that the VM is oom

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Tue, 18 Sep 2007, Nick Piggin wrote: > > ROFL! Yeah of course, how could I have forgotten about our trusty OOM killer > as the solution to the fragmentation problem? It would only have been funnier > if you had said to reboot every so often when memory gets fragmented :) Can we please stop t

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Dave Kleikamp
On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote: > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote: > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote: > > > Here is the incremental small cleanup patch. > > > > > > Remove kamlloc usages in jbd/jbd2 and consistently

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:05, Christoph Lameter wrote: > On Sun, 16 Sep 2007, Nick Piggin wrote: > > > > fsblock doesn't need any of those hacks, of course. > > > > > > Nor does mine for the low orders that we are considering. For order > > > > MAX_ORDER this is unavoidable since the page all

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:21, Christoph Lameter wrote: > On Sun, 16 Sep 2007, Nick Piggin wrote: > > > > So if you argue that vmap is a downside, then please tell me how you > > > > consider the -ENOMEM of your approach to be better? > > > > > > That is again pretty undifferentiated. Are we t

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:00, Christoph Lameter wrote: > On Sun, 16 Sep 2007, Nick Piggin wrote: > > I don't know how it would prevent fragmentation from building up > > anyway. It's commonly the case that potentially unmovable objects > > are allowed to fill up all of ram (dentries, inodes,

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote: > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote: > > Here is the incremental small cleanup patch. > > > > Remove kamlloc usages in jbd/jbd2 and consistently use > > jbd_kmalloc/jbd2_malloc. > > Shouldn't we kill jbd_kmal

Re: [2/3] 2.6.23-rc6: known regressions v2

2007-09-18 Thread Oliver Neukum
Am Dienstag 18 September 2007 schrieb Jan Kara: > > Subject         : umount triggers a warning in jfs and takes almost a minute > > References      : http://lkml.org/lkml/2007/9/4/73 > > Last known good : ? > > Submitter       : Oliver Neukum <[EMAIL PROTECTED]> > > Caused-By       : ? > > Handled

Re: [2/3] 2.6.23-rc6: known regressions v2

2007-09-18 Thread Dave Kleikamp
On Tue, 2007-09-18 at 16:24 +0200, Jan Kara wrote: > > Subject : umount triggers a warning in jfs and takes almost a minute > > References : http://lkml.org/lkml/2007/9/4/73 > > Last known good : ? > > Submitter : Oliver Neukum <[EMAIL PROTECTED]> > > Caused-By : ? > > Han

Re: [2/3] 2.6.23-rc6: known regressions v2

2007-09-18 Thread Jan Kara
> FS > > Subject : hanging ext3 dbench tests > References : http://lkml.org/lkml/2007/9/11/176 > Last known good : ? > Submitter : Andy Whitcroft <[EMAIL PROTECTED]> > Caused-By : ? > Handled-By : ? > Status : under test -- unreproducible at present Yep...

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread David Chinner
On Tue, Sep 18, 2007 at 11:00:40AM +0100, Mel Gorman wrote: > We still lack data on what sort of workloads really benefit from large > blocks (assuming there are any that cannot also be solved by improving > order-0). No we don't. All workloads benefit from larger block sizes when you've got a btr

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Jörn Engel
On Tue, 18 September 2007 11:00:40 +0100, Mel Gorman wrote: > > We still lack data on what sort of workloads really benefit from large > blocks Compressing filesystems like jffs2 and logfs gain better compression ratio with larger blocks. Going from 4KiB to 64KiB gave somewhere around 10% benefi

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Mel Gorman
On (17/09/07 15:00), Christoph Lameter didst pronounce: > On Sun, 16 Sep 2007, Nick Piggin wrote: > > > I don't know how it would prevent fragmentation from building up > > anyway. It's commonly the case that potentially unmovable objects > > are allowed to fill up all of ram (dentries, inodes, et

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Christoph Hellwig
On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote: > Here is the incremental small cleanup patch. > > Remove kamlloc usages in jbd/jbd2 and consistently use > jbd_kmalloc/jbd2_malloc. Shouldn't we kill jbd_kmalloc instead? - To unsubscribe from this list: send the line "unsubscribe