On Tue, 18 Sep 2007, Christoph Lameter wrote:
> Index: linux-2.6/include/linux/mm.h
> ===
> --- linux-2.6.orig/include/linux/mm.h 2007-09-17 21:46:06.0 -0700
> +++ linux-2.6/include/linux/mm.h 2007-09-17 23:56:54.
On Tue, Sep 18, 2007 at 06:06:52PM -0700, Linus Torvalds wrote:
> > especially as the Linux
> > kernel limitations in this area are well known. There's no "16K mess"
> > that SGI is trying to clean up here (and SGI have offered both IA64 and
> > x86_64
On 09/19/2007 06:33 AM, Linus Torvalds wrote:
On Wed, 19 Sep 2007, Rene Herman wrote:
I do feel larger blocksizes continue to make sense in general though. Packet
writing on CD/DVD is a problem already today since the hardware needs 32K or
64K blocks and I'd expect to see more of these and si
On Wed, 19 Sep 2007, Rene Herman wrote:
>
> I do feel larger blocksizes continue to make sense in general though. Packet
> writing on CD/DVD is a problem already today since the hardware needs 32K or
> 64K blocks and I'd expect to see more of these and similiar situations when
> flash gets (even
On 09/19/2007 05:50 AM, Linus Torvalds wrote:
On Wed, 19 Sep 2007, Rene Herman wrote:
Well, not so sure about that. What if one of your expected uses for example is
video data storage -- lots of data, especially for multiple streams, and needs
still relatively fast machinery. Why would you ca
On 09/18/2007 09:44 PM, Linus Torvalds wrote:
Nobody sane would *ever* argue for 16kB+ blocksizes in general.
Well, not so sure about that. What if one of your expected uses for example
is video data storage -- lots of data, especially for multiple streams, and
needs still relatively fast ma
Christoph Lameter wrote:
>
> + if (is_vmalloc_addr(word))
> + page = vmalloc_to_page(word)
^^
Missing ' ; '
> + else
> + page = virt_to_page(word);
> +
> + zone = page_zone(page);
> return &zone->wait_table[ha
On Wed, 19 Sep 2007, Rene Herman wrote:
>
> Well, not so sure about that. What if one of your expected uses for example is
> video data storage -- lots of data, especially for multiple streams, and needs
> still relatively fast machinery. Why would you care for the overhead af
> _small_ blocks?
This adds a new gfp flag
__GFP_VFALLBACK
If specified during a higher order allocation then the system will fall
back to vmap and attempt to create a virtually contiguous area instead of
a physically contiguous area. In many cases the virtually contiguous area
can stand in for the physically cont
Sometimes we need to figure out which vmalloc address is in use
for a certain page struct. There is no easy way to figure out
the vmalloc address from the page struct. So simply search through
the kernel page table to find the address. This is a fairly expensive
process. Use sparingly (or provide a
Make vunmap return the page array that was used at vmap. This is useful
if one has no structures to track the page array but simply stores the
virtual address somewhere. The disposition of the page array can be
decided upon after vunmap. vfree() may now also be used instead of
vunmap which will rel
Virtual fallbacks are rare and thus subtle bugs may creep in if we do not
test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK
allocations fall back to virtual mapping.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
lib/Kconfig.debug | 11 +++
mm/page_alloc.c
Sparsemem currently attempts first to do a physically contiguous mapping
and then falls back to vmalloc. The same thing can now be accomplished
using GFP_VFALLBACK.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/sparse.c | 23 +++
1 file changed, 3 insertions(+
This is in particular useful for large I/Os because it will allow > 100
allocs from the SLUB fast path without having to go to the page allocator.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/buffer.c |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/
SLAB_VFALLBACK can be specified for selected slab caches. If fallback is
available then the conservative settings for higher order allocations are
overridden. We then request an order that can accomodate at mininum
100 objects. The size of an individual slab allocation is allowed to reach
up to 256
If we are in an interrupt context then simply defer the free via a workqueue.
In an interrupt context it is not possible to use vmalloc_addr() to determine
the vmalloc address. So add a variant that does that too.
Removing a virtual mappping *must* be done with interrupts enabled
since tlb_xx fun
If bit waitqueue is passed a virtual address then it must use
vmalloc_to_page instead of virt_to_page to get to the page struct.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
kernel/wait.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
Index: linux-2.6/kernel/wait.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/dcache.c |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/dcache.c
===
--- linux-2.6.orig/fs/dcache.c 2007-09-18 18:42:19.0 -0700
+++
In an interrupt context we cannot wait for the vmlist_lock in
__get_vm_area_node(). So use a trylock instead. If the trylock fails
then the atomic allocation will fail and subsequently be retried.
This only works because the flush_cache_vunmap in use for
allocation is never performing any IPIs in
Avoid expensive lookups of virtual addresses from page structs by
storing the vmalloc address in page->private. We can then avoid
the vmalloc_address() in the get__page() functions and
simply return page->private.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/page_alloc.c |
Currently we have to use vmalloc for the zone wait table possibly generating
the need to create lots of TLBs to access the tables. We can now use
GFP_VFALLBACK to attempt the use of a physically contiguous page that can then
use the large kernel TLBs.
Signed-off-by: Christoph Lameter <[EMAIL PROTE
We already have page table manipulation for vmalloc in vmalloc.c. Move the
vmalloc_to_page() function there as well. Also move the related definitions
from include/linux/mm.h.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
include/linux/mm.h |2 --
include/linux/vmalloc.h |
This test is used in a couple of places. Add a version to vmalloc.h
and replace the other checks.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
drivers/net/cxgb3/cxgb3_offload.c |4 +---
fs/ntfs/malloc.h |3 +--
fs/proc/kcore.c |2 +-
fs/
The page array is repeatedly indexed both in vunmap and vmalloc_area_node().
Add a temporary variable to make it easier to read (and easier to patch
later).
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/vmalloc.c | 16 +++-
1 file changed, 11 insertions(+), 5 deletion
Currently there is a strong tendency to avoid larger page allocations in
the kernel because of past fragmentation issues and the current
defragmentation methods are still evolving. It is not clear to what extend
they can provide reliable allocations for higher order pages (plus the
definition of "r
Make vmalloc functions work the same way as kfree() and friends that
take a const void * argument.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
include/linux/vmalloc.h | 10 +-
mm/vmalloc.c| 16
2 files changed, 13 insertions(+), 13 deletions(
On Tue, 2007-09-18 at 18:06 -0700, Linus Torvalds wrote:
> There is *no* valid reason for 16kB blocksizes unless you have legacy
> issues.
That's not correct.
> The performance issues have nothing to do with the block-size, and
We must be thinking of different performance issues.
> should be
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:
> JBD: Replace slab allocations with page cache allocations
>
> JBD allocate memory for committed_data and frozen_data from slab. However
> JBD should not pass slab pages down to the block layer. Use page allocator
> page
On Tue, 2007-09-18 at 12:44 -0700, Linus Torvalds wrote:
> This is not about performance. Never has been. It's about SGI wanting a
> way out of their current 16kB mess.
Pass the crack pipe, Linus?
> The way to fix performance is to move to x86-64, and use 4kB pages and be
> happy. However, the
On Wed, 19 Sep 2007, Nathan Scott wrote:
>
> FWIW (and I hate to let reality get in the way of a good conspiracy) -
> all SGI systems have always defaulted to using 4K blocksize filesystems;
Yes. And I've been told that:
> there's very few customers who would use larger
.. who apparently woul
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
> On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > > Here is the incremental small cleanup patch.
On Tue, 18 Sep 2007, Nick Piggin wrote:
> > We can avoid all doubt in this patchset as well by adding support for
> > fallback to a vmalloced compound page.
>
> How would you do a vmapped fallback in your patchset? How would
> you keep track of pages 2..N if they don't exist in the radix tree?
T
On Tue, 18 Sep 2007, Nick Piggin wrote:
> On Tuesday 18 September 2007 08:00, Christoph Lameter wrote:
> > On Sun, 16 Sep 2007, Nick Piggin wrote:
> > > I don't know how it would prevent fragmentation from building up
> > > anyway. It's commonly the case that potentially unmovable objects
> > > ar
On Tue, 18 Sep 2007, Andrea Arcangeli wrote:
>
> Many? I can't recall anything besides PF_MEMALLOC and the decision
> that the VM is oom.
*All* of the buddy bitmaps, *all* of the GPF_ATOMIC, *all* of the zone
watermarks, everything that we depend on every single day, is in the end
just about
On Mon, Sep 17, 2007 at 12:56:07AM +0200, Goswin von Brederlow wrote:
> When has free ever given any usefull "free" number? I can perfectly
> fine allocate another gigabyte of memory despide free saing 25MB. But
> that is because I know that the buffer/cached are not locked in.
Well, as you said y
On Tue, Sep 18, 2007 at 11:30:17AM -0700, Linus Torvalds wrote:
> The fact is, *none* of those things are true. The VM doesn't guarantee
> anything, and is already very much about statistics in many places. You
Many? I can't recall anything besides PF_MEMALLOC and the decision
that the VM is oom
On Tue, 18 Sep 2007, Nick Piggin wrote:
>
> ROFL! Yeah of course, how could I have forgotten about our trusty OOM killer
> as the solution to the fragmentation problem? It would only have been funnier
> if you had said to reboot every so often when memory gets fragmented :)
Can we please stop t
On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > Here is the incremental small cleanup patch.
> > >
> > > Remove kamlloc usages in jbd/jbd2 and consistently
On Tuesday 18 September 2007 08:05, Christoph Lameter wrote:
> On Sun, 16 Sep 2007, Nick Piggin wrote:
> > > > fsblock doesn't need any of those hacks, of course.
> > >
> > > Nor does mine for the low orders that we are considering. For order >
> > > MAX_ORDER this is unavoidable since the page all
On Tuesday 18 September 2007 08:21, Christoph Lameter wrote:
> On Sun, 16 Sep 2007, Nick Piggin wrote:
> > > > So if you argue that vmap is a downside, then please tell me how you
> > > > consider the -ENOMEM of your approach to be better?
> > >
> > > That is again pretty undifferentiated. Are we t
On Tuesday 18 September 2007 08:00, Christoph Lameter wrote:
> On Sun, 16 Sep 2007, Nick Piggin wrote:
> > I don't know how it would prevent fragmentation from building up
> > anyway. It's commonly the case that potentially unmovable objects
> > are allowed to fill up all of ram (dentries, inodes,
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > Here is the incremental small cleanup patch.
> >
> > Remove kamlloc usages in jbd/jbd2 and consistently use
> > jbd_kmalloc/jbd2_malloc.
>
> Shouldn't we kill jbd_kmal
Am Dienstag 18 September 2007 schrieb Jan Kara:
> > Subject : umount triggers a warning in jfs and takes almost a minute
> > References : http://lkml.org/lkml/2007/9/4/73
> > Last known good : ?
> > Submitter : Oliver Neukum <[EMAIL PROTECTED]>
> > Caused-By : ?
> > Handled
On Tue, 2007-09-18 at 16:24 +0200, Jan Kara wrote:
> > Subject : umount triggers a warning in jfs and takes almost a minute
> > References : http://lkml.org/lkml/2007/9/4/73
> > Last known good : ?
> > Submitter : Oliver Neukum <[EMAIL PROTECTED]>
> > Caused-By : ?
> > Han
> FS
>
> Subject : hanging ext3 dbench tests
> References : http://lkml.org/lkml/2007/9/11/176
> Last known good : ?
> Submitter : Andy Whitcroft <[EMAIL PROTECTED]>
> Caused-By : ?
> Handled-By : ?
> Status : under test -- unreproducible at present
Yep...
On Tue, Sep 18, 2007 at 11:00:40AM +0100, Mel Gorman wrote:
> We still lack data on what sort of workloads really benefit from large
> blocks (assuming there are any that cannot also be solved by improving
> order-0).
No we don't. All workloads benefit from larger block sizes when
you've got a btr
On Tue, 18 September 2007 11:00:40 +0100, Mel Gorman wrote:
>
> We still lack data on what sort of workloads really benefit from large
> blocks
Compressing filesystems like jffs2 and logfs gain better compression
ratio with larger blocks. Going from 4KiB to 64KiB gave somewhere
around 10% benefi
On (17/09/07 15:00), Christoph Lameter didst pronounce:
> On Sun, 16 Sep 2007, Nick Piggin wrote:
>
> > I don't know how it would prevent fragmentation from building up
> > anyway. It's commonly the case that potentially unmovable objects
> > are allowed to fill up all of ram (dentries, inodes, et
On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> Here is the incremental small cleanup patch.
>
> Remove kamlloc usages in jbd/jbd2 and consistently use
> jbd_kmalloc/jbd2_malloc.
Shouldn't we kill jbd_kmalloc instead?
-
To unsubscribe from this list: send the line "unsubscribe
49 matches
Mail list logo