On Fri, 31 Aug 2007 18:41:21 -0700
Christoph Lameter <[EMAIL PROTECTED]> wrote:
> +#ifndef CONFIG_HIGHMEM
> + if (s->kick || s->flags & SLAB_TEMPORARY)
> + flags |= __GFP_MOVABLE;
> +#endif
> +
Should I do this as
#if !defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTREMOVE)
On Sat, 1 Sep 2007, KAMEZAWA Hiroyuki wrote:
> On Fri, 31 Aug 2007 18:41:21 -0700
> Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> > +#ifndef CONFIG_HIGHMEM
> > + if (s->kick || s->flags & SLAB_TEMPORARY)
> > + flags |= __GFP_MOVABLE;
> > +#endif
> > +
>
> Should I do this as
>
>
Slab defragmentation: Support reiserfs inode defragmentation
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/reiserfs/super.c |8
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 5b68dd3..0344be9 100644
--- a/f
Add some debugging printks for slab defragmentation
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/slub.c | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)
Index: linux-2.6/mm/slub.c
===
--- linux-2.
Support defragmentation for extX filesystem inodes
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/ext2/super.c |9 +
fs/ext3/super.c |8
fs/ext4/super.c |8
3 files changed, 25 insertions(+)
Index: linux-2.6/fs/ext2/super.c
=
Support inode defragmentation for xfs
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/xfs/linux-2.6/xfs_super.c |6 ++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 4528f9a..e60c90e 100644
--- a
This implements the ability to remove inodes in a particular slab
from inode cache. In order to remove an inode we may have to write out
the pages of an inode, the inode itself and remove the dentries referring
to the node.
Provide generic functionality that can be used by filesystems that have
th
Slabs that are reclaimable fit the definition of the objects in
ZONE_MOVABLE. So set __GFP_MOVABLE on them (this only works
on platforms where there is no HIGHMEM. Hopefully that restriction
will vanish at some point).
Also add the SLAB_TEMPORARY flag for slab caches that allocate objects with
a s
Extract the common code to remove a dentry from the lru into a new function
dentry_lru_remove().
Two call sites used list_del() instead of list_del_init(). AFAIK the
performance of both is the same. dentry_lru_remove() does a list_del_init().
As a result dentry->d_lru is now always empty when a d
In order to support defragmentation on the dentry cache we need to have
an determined object state at all times. Without a destructor the object
would have a random state after allocation.
So provide a constructor.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/dcache.c | 26 +
Defragmentation support for buffer heads. We convert the references to
buffers to struct page references and try to remove the buffers from
those pages. If the pages are dirty then trigger writeout so that the
buffer heads can be removed later.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
kick() is called after get() has been used and after the slab has dropped
all of its own locks. The dentry pruning for unused entries works in a
straightforward way.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/dcache.c | 100 +++
Add a flag SlabReclaimable() that is set on slabs with a method
that allows defrag/reclaim. Clear the flag if a reclaim action is not
successful in reducing the number of objects in a slab. The reclaim
flag is set again if all objects have been allocated from it.
Signed-off-by: Christoph Lameter <
Support inode defragmentation for sockets
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
net/socket.c |8
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index ec07703..89fc7a5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -264,6
Creates a special function kmem_cache_isolate_slab() and kmem_cache_reclaim()
to support lumpy reclaim.
In order to isolate pages we will have to handle slab page allocations in
such a way that we can determine if a slab is valid whenever we access it
regardless of its time in life.
A valid slab
Support procfs inode defragmentation
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/proc/inode.c |8
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a5b0dfd..83a66d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
The constructor for buffer_head slabs was removed recently. We need
the constructor in order to insure that slab objects always have a definite
state even before we allocated them.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
fs/buffer.c | 19 +++
1 files changed, 1
SLUB uses compound pages for larger slabs. We need to increment
the page count of these pages in order to make sure that they are not
freed under us for reclaim from within lumpy reclaim.
(The patch is also part of the large blocksize patchset)
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
When we defragmenting slabs then it is advantageous to have all
defragmentable slabs together at the beginning of the list so that we do not
have to scan the complete list. When adding a slab cache put defragmentale
caches first and others last.
Determine the maximum number of objects in defragmen
This patch triggers slab defragmentation from memory reclaim.
The logical point for this is after slab shrinking was performed in
vmscan.c. At that point the fragmentation ratio of a slab was increased
by objects being freed. So we call kmem_cache_defrag from there.
slab_shrink() from vmscan.c is
Add a parameter to add_partial instead of having separate functions.
That allows the detailed control from multiple places when putting
slabs back to the partial list. If we put slabs back to the front
then they are likely used immediately for allocations. If they are
put at the end then we can max
Slab defragmentation (aside from Lumpy Reclaim) may occur:
1. Unconditionally when kmem_cache_shrink is called on a slab cache by the
kernel calling kmem_cache_shrink.
2. Use of the slabinfo command line to trigger slab shrinking.
3. Per node defrag conditionally when kmem_cache_defrag() is c
The defrag_ratio is used to set the threshold when a slabcache should be
defragmented.
The allocation ratio is measured in a percentage of the available slots.
The percentage will be lower for slabs that are more fragmented.
Add a defrag ratio field and set it to 30% by default. A limit of 30%
th
We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
by allocating objects from partial slabs on remote nodes. Rename it to
remote_node_defrag_ratio
to be clear about this.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
include/linux/slub_def.h |5
Add the two methods needed for defragmentation and add the display of the
methods via the proc interface.
Add documentation explaining the use of these methods.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
include/linux/slab.h |3 +++
include/linux/slub_def.h | 32 +
-D lists caches that support defragmentation
-C lists caches that use a ctor.
Change field names for defrag_ratio and remote_node_defrag_ratio.
Add determination of the allocation ratio for slab. The allocation ratio
is the percentage of available slots for objects in use.
Signed-off-by: Christ
Create an ops field in /sys/slab/*/ops to contain all the operations defined
on a slab. This will be used to display the additional operations that we
will define soon.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/slub.c | 16 +---
1 files changed, 9 insertions(+), 7
Slab defragmentation is mainly an issue if Linux is used as a fileserver
and large amounts of dentries, inodes and buffer heads accumulate. In some
load situations the slabs become very sparsely populated so that a lot of
memory is wasted by slabs that only contain one or a few objects. In
extreme
Move the counting function for objects in partial slabs so that it is placed
before kmem_cache_shrink. We will need to use it to establish the
fragmentation ratio of per node slab lists.
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
mm/slub.c | 26 +-
1 files
Thanks to some help Mingming Cao we now have support for extX with up to
64k blocksize. There were several issues in the jbd layer (The ext2
patch that Christoph complained about was dropped).
The patchset can be tested (assuming one has a current git tree)
git checkout -b largeblock
git pu
The blocks per page could be less or quals to 1 with the large block support in
VM.
The patch fixed the way to calculate the number of blocks to reserve in journal
in the
case blocksize > pagesize.
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Index: my2.6/fs/jbd/journal.c
=
>From clameter:
Teach jbd/jbd2 slab management to support >8k block size. Without this, it
refused to mount on >8k ext3.
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Index: my2.6/fs/jbd/journal.c
===
--- my2.6.orig/fs/jbd/journal
On Wed, 2007-08-29 at 17:47 -0700, Mingming Cao wrote:
> Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested
> only.
>
> Next steps:
> Need a e2fsprogs changes to able test this feature. As mkfs needs to be
> educated not assuming rec_len to be blocksize all the time.
> W
On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips wrote:
> Resubmitting a bio or submitting a dependent bio from
> inside a block driver does not need to be throttled because all
> resources required to guarantee completion must have been obtained
> _before_ the bio was allowed to procee
On Fri, Aug 31 2007, Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > > Ok. So another solution maybe to limit the blocksizes that can be used
> > > with a device?
> >
> > That'd work for creation, but not for moving things around.
>
> What do you mean by moving things ar
On Wed, 2007-08-29 at 23:17 -0500, Shaun Zinck wrote:
> This replaces some macros and code, which do the same thing as DIV_ROUND_UP
> defined in kernel.h, to use the DIV_ROUND_UP macro.
>
> Signed-off-by: Shaun Zinck <[EMAIL PROTECTED]>
Thanks. I've added this to the jfs git tree. It's queued f
Hi Daniel.
On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips ([EMAIL PROTECTED])
wrote:
> On Wednesday 29 August 2007 01:53, Evgeniy Polyakov wrote:
> > Then, if of course you will want, which I doubt, you can reread
> > previous mails and find that it was pointed to that race and
> > pos
On Fri, 31 August 2007 08:22:45 -0700, Christoph Lameter wrote:
>
> What do you mean by moving things around? Creation binds a filesystem to a
> device.
Create the filesystem on a usb key, then move it to the next machine,
i suppose.
Or on any other movable medium, including disks, nbd, iSCSI,.
On Fri, 31 Aug 2007, Dmitry Monakhov wrote:
> > Ok. So another solution maybe to limit the blocksizes that can be used
> > with a device?
> IMHO It is not good because after fs was created with big blksize it's image
> cant be used on other devices.
Ok so a raw copy of the partition would do th
On Fri, 31 Aug 2007, Jens Axboe wrote:
> > Ok. So another solution maybe to limit the blocksizes that can be used
> > with a device?
>
> That'd work for creation, but not for moving things around.
What do you mean by moving things around? Creation binds a filesystem to a
device.
> > H.. W
On 00:52 Fri 31 Aug , Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > They have nothing to do with each other, you are mixing things up. It
> > has nothing to do with the device being able to dma into that memory or
> > not, we have fine existing infrastructure to handl
On Fri, Aug 31 2007, Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > They have nothing to do with each other, you are mixing things up. It
> > has nothing to do with the device being able to dma into that memory or
> > not, we have fine existing infrastructure to handle tha
On Fri, 31 Aug 2007, Jens Axboe wrote:
> They have nothing to do with each other, you are mixing things up. It
> has nothing to do with the device being able to dma into that memory or
> not, we have fine existing infrastructure to handle that. But different
> hardware have different characteristi
On Fri, Aug 31 2007, Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > > A DMA boundary cannot be crossed AFAIK. The compound pages are aligned to
> > > the power of two boundaries and the page allocator will not create pages
> > > that cross the zone boundaries.
> >
> > W
On Fri, 31 Aug 2007, Jens Axboe wrote:
> > A DMA boundary cannot be crossed AFAIK. The compound pages are aligned to
> > the power of two boundaries and the page allocator will not create pages
> > that cross the zone boundaries.
>
> With a 64k page and a dma boundary of 0x7fff, that's two segm
On Fri, Aug 31 2007, Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > > Could you be more specific?
> >
> > Size of a single segment, for instance. Or if the bio crosses a dma
> > boundary. If your block is 64kb and the maximum segment size is 32kb,
> > then you would need
On Fri, 31 Aug 2007, Jens Axboe wrote:
> > Could you be more specific?
>
> Size of a single segment, for instance. Or if the bio crosses a dma
> boundary. If your block is 64kb and the maximum segment size is 32kb,
> then you would need to clone the bio and split it into two.
A DMA boundary cann
On Fri, Aug 31 2007, Christoph Lameter wrote:
> On Fri, 31 Aug 2007, Jens Axboe wrote:
>
> > > So if we try to push a too large buffer down with submit_bh() we get a
> > > failure.
> >
> > Only partly, you may be violating a number of other restrictions (size
> > is many things, not just length
On Fri, 31 Aug 2007, Jens Axboe wrote:
> > So if we try to push a too large buffer down with submit_bh() we get a
> > failure.
>
> Only partly, you may be violating a number of other restrictions (size
> is many things, not just length of the data).
Could you be more specific?
-
To unsubscribe
49 matches
Mail list logo