Re: [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support

2007-08-31 Thread KAMEZAWA Hiroyuki
On Fri, 31 Aug 2007 18:41:21 -0700 Christoph Lameter <[EMAIL PROTECTED]> wrote: > +#ifndef CONFIG_HIGHMEM > + if (s->kick || s->flags & SLAB_TEMPORARY) > + flags |= __GFP_MOVABLE; > +#endif > + Should I do this as #if !defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTREMOVE)

Re: [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support

2007-08-31 Thread Christoph Lameter
On Sat, 1 Sep 2007, KAMEZAWA Hiroyuki wrote: > On Fri, 31 Aug 2007 18:41:21 -0700 > Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > +#ifndef CONFIG_HIGHMEM > > + if (s->kick || s->flags & SLAB_TEMPORARY) > > + flags |= __GFP_MOVABLE; > > +#endif > > + > > Should I do this as > >

[RFC 21/26] FS: Slab defrag: Reiserfs support

2007-08-31 Thread Christoph Lameter
Slab defragmentation: Support reiserfs inode defragmentation Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/reiserfs/super.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 5b68dd3..0344be9 100644 --- a/f

[RFC 26/26] SLUB: Add debugging for slab defrag

2007-08-31 Thread Christoph Lameter
Add some debugging printks for slab defragmentation Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slub.c === --- linux-2.

[RFC 18/26] FS: ExtX filesystem defrag

2007-08-31 Thread Christoph Lameter
Support defragmentation for extX filesystem inodes Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/ext2/super.c |9 + fs/ext3/super.c |8 fs/ext4/super.c |8 3 files changed, 25 insertions(+) Index: linux-2.6/fs/ext2/super.c =

[RFC 19/26] FS: XFS slab defragmentation

2007-08-31 Thread Christoph Lameter
Support inode defragmentation for xfs Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/xfs/linux-2.6/xfs_super.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index 4528f9a..e60c90e 100644 --- a

[RFC 17/26] inodes: Support generic defragmentation

2007-08-31 Thread Christoph Lameter
This implements the ability to remove inodes in a particular slab from inode cache. In order to remove an inode we may have to write out the pages of an inode, the inode itself and remove the dentries referring to the node. Provide generic functionality that can be used by filesystems that have th

[RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support

2007-08-31 Thread Christoph Lameter
Slabs that are reclaimable fit the definition of the objects in ZONE_MOVABLE. So set __GFP_MOVABLE on them (this only works on platforms where there is no HIGHMEM. Hopefully that restriction will vanish at some point). Also add the SLAB_TEMPORARY flag for slab caches that allocate objects with a s

[RFC 23/26] dentries: Extract common code to remove dentry from lru

2007-08-31 Thread Christoph Lameter
Extract the common code to remove a dentry from the lru into a new function dentry_lru_remove(). Two call sites used list_del() instead of list_del_init(). AFAIK the performance of both is the same. dentry_lru_remove() does a list_del_init(). As a result dentry->d_lru is now always empty when a d

[RFC 24/26] dentries: Add constructor

2007-08-31 Thread Christoph Lameter
In order to support defragmentation on the dentry cache we need to have an determined object state at all times. Without a destructor the object would have a random state after allocation. So provide a constructor. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/dcache.c | 26 +

[RFC 16/26] Buffer heads: Support slab defrag

2007-08-31 Thread Christoph Lameter
Defragmentation support for buffer heads. We convert the references to buffers to struct page references and try to remove the buffers from those pages. If the pages are dirty then trigger writeout so that the buffer heads can be removed later. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

[RFC 25/26] dentries: dentry defragmentation

2007-08-31 Thread Christoph Lameter
kick() is called after get() has been used and after the slab has dropped all of its own locks. The dentry pruning for unused entries works in a straightforward way. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/dcache.c | 100 +++

[RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts

2007-08-31 Thread Christoph Lameter
Add a flag SlabReclaimable() that is set on slabs with a method that allows defrag/reclaim. Clear the flag if a reclaim action is not successful in reducing the number of objects in a slab. The reclaim flag is set again if all objects have been allocated from it. Signed-off-by: Christoph Lameter <

[RFC 22/26] FS: Socket inode defragmentation

2007-08-31 Thread Christoph Lameter
Support inode defragmentation for sockets Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- net/socket.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/net/socket.c b/net/socket.c index ec07703..89fc7a5 100644 --- a/net/socket.c +++ b/net/socket.c @@ -264,6

[RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim

2007-08-31 Thread Christoph Lameter
Creates a special function kmem_cache_isolate_slab() and kmem_cache_reclaim() to support lumpy reclaim. In order to isolate pages we will have to handle slab page allocations in such a way that we can determine if a slab is valid whenever we access it regardless of its time in life. A valid slab

[RFC 20/26] FS: Proc filesystem support for slab defrag

2007-08-31 Thread Christoph Lameter
Support procfs inode defragmentation Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/proc/inode.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index a5b0dfd..83a66d7 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c

[RFC 15/26] bufferhead: Revert constructor removal

2007-08-31 Thread Christoph Lameter
The constructor for buffer_head slabs was removed recently. We need the constructor in order to insure that slab objects always have a definite state even before we allocated them. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/buffer.c | 19 +++ 1 files changed, 1

[RFC 11/26] VM: Allow get_page_unless_zero on compound pages

2007-08-31 Thread Christoph Lameter
SLUB uses compound pages for larger slabs. We need to increment the page count of these pages in order to make sure that they are not freed under us for reclaim from within lumpy reclaim. (The patch is also part of the large blocksize patchset) Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

[RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs

2007-08-31 Thread Christoph Lameter
When we defragmenting slabs then it is advantageous to have all defragmentable slabs together at the beginning of the list so that we do not have to scan the complete list. When adding a slab cache put defragmentale caches first and others last. Determine the maximum number of objects in defragmen

[RFC 10/26] SLUB: Trigger defragmentation from memory reclaim

2007-08-31 Thread Christoph Lameter
This patch triggers slab defragmentation from memory reclaim. The logical point for this is after slab shrinking was performed in vmscan.c. At that point the fragmentation ratio of a slab was increased by objects being freed. So we call kmem_cache_defrag from there. slab_shrink() from vmscan.c is

[RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function

2007-08-31 Thread Christoph Lameter
Add a parameter to add_partial instead of having separate functions. That allows the detailed control from multiple places when putting slabs back to the partial list. If we put slabs back to the front then they are likely used immediately for allocations. If they are put at the end then we can max

[RFC 09/26] SLUB: Slab defrag core

2007-08-31 Thread Christoph Lameter
Slab defragmentation (aside from Lumpy Reclaim) may occur: 1. Unconditionally when kmem_cache_shrink is called on a slab cache by the kernel calling kmem_cache_shrink. 2. Use of the slabinfo command line to trigger slab shrinking. 3. Per node defrag conditionally when kmem_cache_defrag() is c

[RFC 04/26] SLUB: Add defrag_ratio field and sysfs support.

2007-08-31 Thread Christoph Lameter
The defrag_ratio is used to set the threshold when a slabcache should be defragmented. The allocation ratio is measured in a percentage of the available slots. The percentage will be lower for slabs that are more fragmented. Add a defrag ratio field and set it to 30% by default. A limit of 30% th

[RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

2007-08-31 Thread Christoph Lameter
We need the defrag ratio for the non NUMA situation now. The NUMA defrag works by allocating objects from partial slabs on remote nodes. Rename it to remote_node_defrag_ratio to be clear about this. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h |5

[RFC 06/26] SLUB: Add get() and kick() methods

2007-08-31 Thread Christoph Lameter
Add the two methods needed for defragmentation and add the display of the methods via the proc interface. Add documentation explaining the use of these methods. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slab.h |3 +++ include/linux/slub_def.h | 32 +

[RFC 01/26] SLUB: Extend slabinfo to support -D and -C options

2007-08-31 Thread Christoph Lameter
-D lists caches that support defragmentation -C lists caches that use a ctor. Change field names for defrag_ratio and remote_node_defrag_ratio. Add determination of the allocation ratio for slab. The allocation ratio is the percentage of available slots for objects in use. Signed-off-by: Christ

[RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0

2007-08-31 Thread Christoph Lameter
Create an ops field in /sys/slab/*/ops to contain all the operations defined on a slab. This will be used to display the additional operations that we will define soon. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 16 +--- 1 files changed, 9 insertions(+), 7

[RFC 00/26] Slab defragmentation V5

2007-08-31 Thread Christoph Lameter
Slab defragmentation is mainly an issue if Linux is used as a fileserver and large amounts of dentries, inodes and buffer heads accumulate. In some load situations the slabs become very sparsely populated so that a lot of memory is wasted by slabs that only contain one or a few objects. In extreme

[RFC 02/26] SLUB: Move count_partial()

2007-08-31 Thread Christoph Lameter
Move the counting function for objects in partial slabs so that it is placed before kmem_cache_shrink. We will need to use it to establish the fragmentation ratio of per node slab lists. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 26 +- 1 files

Re: [00/36] Large Blocksize Support V6

2007-08-31 Thread Christoph Lameter
Thanks to some help Mingming Cao we now have support for extX with up to 64k blocksize. There were several issues in the jbd layer (The ext2 patch that Christoph complained about was dropped). The patchset can be tested (assuming one has a current git tree) git checkout -b largeblock git pu

[RFC 2/2] JBD: blocks reservation fix for large block support

2007-08-31 Thread Mingming Cao
The blocks per page could be less or quals to 1 with the large block support in VM. The patch fixed the way to calculate the number of blocks to reserve in journal in the case blocksize > pagesize. Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Index: my2.6/fs/jbd/journal.c =

[RFC 1/2] JBD: slab management support for large block(>8k)

2007-08-31 Thread Mingming Cao
>From clameter: Teach jbd/jbd2 slab management to support >8k block size. Without this, it refused to mount on >8k ext3. Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Index: my2.6/fs/jbd/journal.c === --- my2.6.orig/fs/jbd/journal

Re: [RFC 1/4] Large Blocksize support for Ext2/3/4

2007-08-31 Thread Mingming Cao
On Wed, 2007-08-29 at 17:47 -0700, Mingming Cao wrote: > Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested > only. > > Next steps: > Need a e2fsprogs changes to able test this feature. As mkfs needs to be > educated not assuming rec_len to be blocksize all the time. > W

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-31 Thread Alasdair G Kergon
On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips wrote: > Resubmitting a bio or submitting a dependent bio from > inside a block driver does not need to be throttled because all > resources required to guarantee completion must have been obtained > _before_ the bio was allowed to procee

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jens Axboe
On Fri, Aug 31 2007, Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > > Ok. So another solution maybe to limit the blocksizes that can be used > > > with a device? > > > > That'd work for creation, but not for moving things around. > > What do you mean by moving things ar

Re: [PATCH] fs/jfs: use DIV_ROUND_UP where appropriate

2007-08-31 Thread Dave Kleikamp
On Wed, 2007-08-29 at 23:17 -0500, Shaun Zinck wrote: > This replaces some macros and code, which do the same thing as DIV_ROUND_UP > defined in kernel.h, to use the DIV_ROUND_UP macro. > > Signed-off-by: Shaun Zinck <[EMAIL PROTECTED]> Thanks. I've added this to the jfs git tree. It's queued f

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-31 Thread Evgeniy Polyakov
Hi Daniel. On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Wednesday 29 August 2007 01:53, Evgeniy Polyakov wrote: > > Then, if of course you will want, which I doubt, you can reread > > previous mails and find that it was pointed to that race and > > pos

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jörn Engel
On Fri, 31 August 2007 08:22:45 -0700, Christoph Lameter wrote: > > What do you mean by moving things around? Creation binds a filesystem to a > device. Create the filesystem on a usb key, then move it to the next machine, i suppose. Or on any other movable medium, including disks, nbd, iSCSI,.

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Dmitry Monakhov wrote: > > Ok. So another solution maybe to limit the blocksizes that can be used > > with a device? > IMHO It is not good because after fs was created with big blksize it's image > cant be used on other devices. Ok so a raw copy of the partition would do th

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Jens Axboe wrote: > > Ok. So another solution maybe to limit the blocksizes that can be used > > with a device? > > That'd work for creation, but not for moving things around. What do you mean by moving things around? Creation binds a filesystem to a device. > > H.. W

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Dmitry Monakhov
On 00:52 Fri 31 Aug , Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > They have nothing to do with each other, you are mixing things up. It > > has nothing to do with the device being able to dma into that memory or > > not, we have fine existing infrastructure to handl

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jens Axboe
On Fri, Aug 31 2007, Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > They have nothing to do with each other, you are mixing things up. It > > has nothing to do with the device being able to dma into that memory or > > not, we have fine existing infrastructure to handle tha

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Jens Axboe wrote: > They have nothing to do with each other, you are mixing things up. It > has nothing to do with the device being able to dma into that memory or > not, we have fine existing infrastructure to handle that. But different > hardware have different characteristi

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jens Axboe
On Fri, Aug 31 2007, Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > > A DMA boundary cannot be crossed AFAIK. The compound pages are aligned to > > > the power of two boundaries and the page allocator will not create pages > > > that cross the zone boundaries. > > > > W

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Jens Axboe wrote: > > A DMA boundary cannot be crossed AFAIK. The compound pages are aligned to > > the power of two boundaries and the page allocator will not create pages > > that cross the zone boundaries. > > With a 64k page and a dma boundary of 0x7fff, that's two segm

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jens Axboe
On Fri, Aug 31 2007, Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > > Could you be more specific? > > > > Size of a single segment, for instance. Or if the bio crosses a dma > > boundary. If your block is 64kb and the maximum segment size is 32kb, > > then you would need

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Jens Axboe wrote: > > Could you be more specific? > > Size of a single segment, for instance. Or if the bio crosses a dma > boundary. If your block is 64kb and the maximum segment size is 32kb, > then you would need to clone the bio and split it into two. A DMA boundary cann

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Jens Axboe
On Fri, Aug 31 2007, Christoph Lameter wrote: > On Fri, 31 Aug 2007, Jens Axboe wrote: > > > > So if we try to push a too large buffer down with submit_bh() we get a > > > failure. > > > > Only partly, you may be violating a number of other restrictions (size > > is many things, not just length

Re: [11/36] Use page_cache_xxx in fs/buffer.c

2007-08-31 Thread Christoph Lameter
On Fri, 31 Aug 2007, Jens Axboe wrote: > > So if we try to push a too large buffer down with submit_bh() we get a > > failure. > > Only partly, you may be violating a number of other restrictions (size > is many things, not just length of the data). Could you be more specific? - To unsubscribe