[rfc 44/45] Remove local_t support

2007-11-19 Thread clameter
There is no user of local_t remaining after the cpu ops patchset. local_t always suffered from the problem that the operations it generated were not able to perform the relocation of a pointer to the target processor and the atomic update at the same time. There was a need to disable preemption

[rfc 45/45] Modules: Hack to handle symbols that have a zero value

2007-11-19 Thread clameter
The module subsystem cannot handle symbols that are zero. It prints out a message that these symbols are unresolved. Define a constant UNRESOLVED that is used to hold the value used for unresolved symbols. Set it to 1 (its hopefully unlikely that a symbol will have the value 1). This is

[rfc 32/45] Module handling: Use CPU_xx ops to dynamically allocate counters

2007-11-19 Thread clameter
Use the CPU_xx operations to deal with the per cpu data. Avoid a loop to NR_CPUS here. Use the possible map instead. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/module.h | 13 + kernel/module.c| 17 +++-- 2 files changed, 12

[rfc 41/45] VM statistics: Use CPU ops

2007-11-19 Thread clameter
The use of CPU ops here avoids the offset calculations that we used to have to do with per cpu ops. The result of this patch is that event counters are coded with a single instruction the following way: incq %gs:offset(%rip) Without these patches this was: mov%gs:0x8,%rdx mov

[rfc 43/45] x86_64: Add a CPU_OR to support or_pda()

2007-11-19 Thread clameter
Get rid of one of the leftover pda accessors and cut out some more of pda.h. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/asm-x86/pda.h | 35 +-- include/asm-x86/percpu_64.h | 30 ++ 2 files changed, 31

[rfc 39/45] x86_64: Remove the data_offset field from the pda.

2007-11-19 Thread clameter
It is useless now since gs can always stand in for data_offset. Move active_mm into the available slot in order to not upset the established offsets. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/kernel/asm-offsets_64.c |1 - arch/x86/kernel/entry_64.S |7

[rfc 40/45] x86_64: Provide per_cpu_var definition

2007-11-19 Thread clameter
There needs to be a way to determine the offset for the CPU ops of per cpu variables. The offset is simply the address of the variable. But we do not want to code ugly things like CPU_READ(per_cpu__statistics) in the core. So define a new helper per_cpu_var(var) that simply adds the

[rfc 36/45] X86_64: Place pda first in cpu area.

2007-11-19 Thread clameter
If we move the pda to the beginning of the cpu area then the gs segment will also point to the beginning of the cpu area. After this patch we can use gs on any percpu variable or cpu_alloc pointer from cpu 0 to get to the active processors variable. There is no need anymore to add a per cpu offset

[rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread clameter
Support fast cpu ops in x86_64 by providing a series of functions that generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code can exploit the availability of fast per cpu operations. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/Kconfig|

[rfc 38/45] x86_64: Remove obsolete per_cpu offset calculations

2007-11-19 Thread clameter
Replace all uses of __per_cpu_offset with CPU_PTR. This will avoid a lot of lookups for per cpu offset calculations. Keep per_cpu_offset() itself because lockdep uses it. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/kernel/smpboot_64.c |8 +++-

[rfc 34/45] x86_64: Fold percpu area into the cpu area.

2007-11-19 Thread clameter
Use boot_cpu_alloc to allocate a cpu area chunk that is needed to store the statically declared per cpu data and then point the per_cpu_offset pointers to the cpu area. The per cpu area is moved to a ZERO offset using some linker scripting. All per cpu variable addresses become true offsets into

[rfc 35/45] X86_64: Declare pda as per cpu data thereby moving it into the cpu area

2007-11-19 Thread clameter
Declare the pda as a per cpu variable. This will have the effect of moving the pda data into the cpu area managed by cpu alloc. The boot_pdas are only needed in head64.c so move the declaration over there and make it static. Remove the code that allocates special pda data structures.

[rfc 33/45] x86_64: Use CPU ops for nmi alert counter

2007-11-19 Thread clameter
These are critical fast paths. Using a segment override instead of an address calculation is reducing overhead. Signed-off-by: Christoph LAmeter <[EMAIL PROTECTED]> --- arch/x86/kernel/nmi_64.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index:

[rfc 30/45] cpu alloc: Use in the crypto subsystem.

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- crypto/async_tx/async_tx.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) Index: linux-2.6/crypto/async_tx/async_tx.c === ---

[rfc 31/45] cpu alloc: Remove the allocpercpu functionality

2007-11-19 Thread clameter
There is no user of allocpercpu left after all the earlier patches were applied. Remove the code that realizes allocpercpu. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/percpu.h | 80 -- mm/Makefile|1 mm/allocpercpu.c

[rfc 29/45] cpu alloc: Use for infiniband

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/infiniband/hw/ehca/ehca_irq.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) Index: linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c

[rfc 27/45] cpu alloc: convert mib handling to cpu alloc

2007-11-19 Thread clameter
Use the cpu alloc functions for the mib handling functions in the net layer. The API for snmp_mib_free() is changed to add a size parameter since cpu_fre requires that. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/net/ip.h|2 +- include/net/snmp.h | 15

[rfc 28/45] cpu_alloc: convert network sockets

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- net/core/sock.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/net/core/sock.c === --- linux-2.6.orig/net/core/sock.c 2007-11-18

[rfc 24/45] cpu alloc: convert loopback statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/net/loopback.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) Index: linux-2.6/drivers/net/loopback.c === ---

[rfc 25/45] cpu alloc: veth conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/net/veth.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/drivers/net/veth.c === --- linux-2.6.orig/drivers/net/veth.c

[rfc 26/45] cpu alloc: Chelsio statistics conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/net/chelsio/sge.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) Index: linux-2.6/drivers/net/chelsio/sge.c === ---

[rfc 22/45] cpu alloc: convert scatches

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- net/ipv4/ipcomp.c | 26 +- net/ipv6/ipcomp6.c | 26 +- 2 files changed, 26 insertions(+), 26 deletions(-) Index: linux-2.6/net/ipv4/ipcomp.c

[rfc 23/45] cpu alloc: dmaengine conversion

2007-11-19 Thread clameter
Convert DMA engine to use CPU_xx operations. This also removes the use of local_t from the dmaengine. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 38 ++ include/linux/dmaengine.h | 16 ++-- 2 files

[rfc 21/45] cpu alloc: tcp statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- net/ipv4/tcp.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/net/ipv4/tcp.c === --- linux-2.6.orig/net/ipv4/tcp.c 2007-11-15

[rfc 20/45] cpu alloc: neigbour statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/net/neighbour.h |6 +- net/core/neighbour.c| 11 ++- 2 files changed, 7 insertions(+), 10 deletions(-) Index: linux-2.6/include/net/neighbour.h

[rfc 16/45] cpu alloc: blktrace conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- block/blktrace.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/block/blktrace.c === --- linux-2.6.orig/block/blktrace.c 2007-11-15

[rfc 19/45] cpu alloc: NFS statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/nfs/iostat.h |8 fs/nfs/super.c |2 +- 2 files changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/fs/nfs/iostat.h === ---

[rfc 17/45] cpu alloc: SRCU

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- kernel/rcutorture.c |4 ++-- kernel/srcu.c | 20 2 files changed, 10 insertions(+), 14 deletions(-) Index: linux-2.6/kernel/rcutorture.c ===

[rfc 18/45] cpu alloc: XFS counters

2007-11-19 Thread clameter
Also remove the useless zeroing after allocation. Allocpercpu already zeroed the objects. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/xfs/xfs_mount.c | 24 1 file changed, 8 insertions(+), 16 deletions(-) Index: linux-2.6/fs/xfs/xfs_mount.c

[rfc 14/45] cpu alloc: ACPI cstate handling conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/kernel/acpi/cstate.c |9 + arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |7 --- drivers/acpi/processor_perflib.c |4 ++-- 3 files changed, 11 insertions(+), 9 deletions(-) Index:

[rfc 15/45] cpu alloc: genhd statistics conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/genhd.h | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) Index: linux-2.6/include/linux/genhd.h === ---

[rfc 12/45] cpu alloc: crash_notes conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/ia64/kernel/crash.c |2 +- drivers/base/cpu.c |2 +- kernel/kexec.c |4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/arch/ia64/kernel/crash.c

[rfc 13/45] cpu alloc: workqueue conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- kernel/workqueue.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) Index: linux-2.6/kernel/workqueue.c === ---

[rfc 09/45] cpu alloc: IA64 support

2007-11-19 Thread clameter
Typical use of per cpu memory for a small system of 8G 8p 4node is less than 64k per cpu memory. This is increasing rapidly for larger systems where we can get up to 512k or 1M of memory used for cpu storage. The maximum size allowed of the cpu area is 128MB of memory. The cpu area is placed in

[rfc 06/45] cpu alloc: page allocator conversion

2007-11-19 Thread clameter
Use the new cpu_alloc functionality to avoid per cpu arrays in struct zone. This drastically reduces the size of struct zone for systems with a large amounts of processors and allows placement of critical variables of struct zone in one cacheline even on very large systems. Another effect is that

[rfc 10/45] cpu_alloc: Sparc64 support

2007-11-19 Thread clameter
Enable a simple virtual configuration with 32MB available per cpu so that we do not use a static area on sparc64. [Not tested. I have no sparc64] Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/sparc64/Kconfig | 15 +++ arch/sparc64/kernel/vmlinux.lds.S

[rfc 11/45] cpu alloc: percpu_counter conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- lib/percpu_counter.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux-2.6/lib/percpu_counter.c === --- linux-2.6.orig/lib/percpu_counter.c

[rfc 07/45] cpu_alloc: Implement dynamically extendable cpu areas

2007-11-19 Thread clameter
Virtually map the cpu areas. This allows bigger maximum sizes and to only populate the virtual mappings on demand. In order to use the virtual mapping capability the arch must setup some configuration variables in arch/xxx/Kconfig: CONFIG_CPU_AREA_VIRTUAL to y CONFIG_CPU_AREA_ORDER to

[rfc 05/45] cpu alloc: Remove SLUB fields

2007-11-19 Thread clameter
Remove the fields in kmem_cache_cpu that were used to cache data from kmem_cache when they were in different cachelines. The cacheline that holds the per cpu array pointer now also holds these values. We can cut down the kmem_cache_cpu size to almost half. The get_freepointer() and

[rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread clameter
64 bit: Set up a cpu area that allows the use of up 16MB for each processor. Cpu memory use can grow a bit. F.e. if we assume that a pageset occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per processor. This

[rfc 02/45] cpu alloc: Simple version of the allocator (static allocations)

2007-11-19 Thread clameter
The core portion of the cpu allocator. The per cpu allocator allows dynamic allocation of memory on all processor simultaneously. A bitmap is used to track used areas. The allocator implements tight packing to reduce the cache footprint and increase speed since cacheline contention is typically

[rfc 04/45] cpu alloc: Use in SLUB

2007-11-19 Thread clameter
Using cpu alloc removes the needs for the per cpu arrays in the kmem_cache struct. These could get quite big if we have to support system of up to thousands of cpus. The use of alloc_percpu means that: 1. The size of kmem_cache for SMP configuration shrinks since we will only need 1 pointer

[rfc 01/45] ACPI: Avoid references to impossible processors.

2007-11-19 Thread clameter
ACPI uses NR_CPUS in various loops and in some it accesses per cpu data of processors that are not present(!) and that will never be present. The pointers to per cpu data are typically not initialized for processors that are not present. So we seem to be reading something here from offset 0 in

[rfc 03/45] Generic CPU operations: Core piece

2007-11-19 Thread clameter
Currently the per cpu subsystem is not able to use the atomic capabilities of the processors we have. This adds new functionality that allows the optimizing of per cpu variable handliong. It in particular provides a simple way to exploit atomic operations to avoid having to disable itnerrupts or

[rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64

2007-11-19 Thread clameter
This is a pretty early draft stage of the patch. It works on x86_64 only. Its a bit massive so I'd like to have some feedback before proceeding (and maybe some help)?. The support for other arches was not tested yet. The patch establishes a new set of cpu operations that allow to exploit single

[rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64

2007-11-19 Thread clameter
This is a pretty early draft stage of the patch. It works on x86_64 only. Its a bit massive so I'd like to have some feedback before proceeding (and maybe some help)?. The support for other arches was not tested yet. The patch establishes a new set of cpu operations that allow to exploit single

[rfc 01/45] ACPI: Avoid references to impossible processors.

2007-11-19 Thread clameter
ACPI uses NR_CPUS in various loops and in some it accesses per cpu data of processors that are not present(!) and that will never be present. The pointers to per cpu data are typically not initialized for processors that are not present. So we seem to be reading something here from offset 0 in

[rfc 03/45] Generic CPU operations: Core piece

2007-11-19 Thread clameter
Currently the per cpu subsystem is not able to use the atomic capabilities of the processors we have. This adds new functionality that allows the optimizing of per cpu variable handliong. It in particular provides a simple way to exploit atomic operations to avoid having to disable itnerrupts or

[rfc 02/45] cpu alloc: Simple version of the allocator (static allocations)

2007-11-19 Thread clameter
The core portion of the cpu allocator. The per cpu allocator allows dynamic allocation of memory on all processor simultaneously. A bitmap is used to track used areas. The allocator implements tight packing to reduce the cache footprint and increase speed since cacheline contention is typically

[rfc 04/45] cpu alloc: Use in SLUB

2007-11-19 Thread clameter
Using cpu alloc removes the needs for the per cpu arrays in the kmem_cache struct. These could get quite big if we have to support system of up to thousands of cpus. The use of alloc_percpu means that: 1. The size of kmem_cache for SMP configuration shrinks since we will only need 1 pointer

[rfc 05/45] cpu alloc: Remove SLUB fields

2007-11-19 Thread clameter
Remove the fields in kmem_cache_cpu that were used to cache data from kmem_cache when they were in different cachelines. The cacheline that holds the per cpu array pointer now also holds these values. We can cut down the kmem_cache_cpu size to almost half. The get_freepointer() and

[rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread clameter
64 bit: Set up a cpu area that allows the use of up 16MB for each processor. Cpu memory use can grow a bit. F.e. if we assume that a pageset occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per processor. This

[rfc 07/45] cpu_alloc: Implement dynamically extendable cpu areas

2007-11-19 Thread clameter
Virtually map the cpu areas. This allows bigger maximum sizes and to only populate the virtual mappings on demand. In order to use the virtual mapping capability the arch must setup some configuration variables in arch/xxx/Kconfig: CONFIG_CPU_AREA_VIRTUAL to y CONFIG_CPU_AREA_ORDER to

[rfc 10/45] cpu_alloc: Sparc64 support

2007-11-19 Thread clameter
Enable a simple virtual configuration with 32MB available per cpu so that we do not use a static area on sparc64. [Not tested. I have no sparc64] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/sparc64/Kconfig | 15 +++ arch/sparc64/kernel/vmlinux.lds.S |

[rfc 11/45] cpu alloc: percpu_counter conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- lib/percpu_counter.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux-2.6/lib/percpu_counter.c === --- linux-2.6.orig/lib/percpu_counter.c

[rfc 09/45] cpu alloc: IA64 support

2007-11-19 Thread clameter
Typical use of per cpu memory for a small system of 8G 8p 4node is less than 64k per cpu memory. This is increasing rapidly for larger systems where we can get up to 512k or 1M of memory used for cpu storage. The maximum size allowed of the cpu area is 128MB of memory. The cpu area is placed in

[rfc 06/45] cpu alloc: page allocator conversion

2007-11-19 Thread clameter
Use the new cpu_alloc functionality to avoid per cpu arrays in struct zone. This drastically reduces the size of struct zone for systems with a large amounts of processors and allows placement of critical variables of struct zone in one cacheline even on very large systems. Another effect is that

[rfc 12/45] cpu alloc: crash_notes conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/ia64/kernel/crash.c |2 +- drivers/base/cpu.c |2 +- kernel/kexec.c |4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/arch/ia64/kernel/crash.c

[rfc 13/45] cpu alloc: workqueue conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- kernel/workqueue.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) Index: linux-2.6/kernel/workqueue.c === ---

[rfc 14/45] cpu alloc: ACPI cstate handling conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/kernel/acpi/cstate.c |9 + arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |7 --- drivers/acpi/processor_perflib.c |4 ++-- 3 files changed, 11 insertions(+), 9 deletions(-) Index:

[rfc 25/45] cpu alloc: veth conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/net/veth.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/drivers/net/veth.c === --- linux-2.6.orig/drivers/net/veth.c 2007-11-15

[rfc 26/45] cpu alloc: Chelsio statistics conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/net/chelsio/sge.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) Index: linux-2.6/drivers/net/chelsio/sge.c === ---

[rfc 24/45] cpu alloc: convert loopback statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/net/loopback.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) Index: linux-2.6/drivers/net/loopback.c === ---

[rfc 27/45] cpu alloc: convert mib handling to cpu alloc

2007-11-19 Thread clameter
Use the cpu alloc functions for the mib handling functions in the net layer. The API for snmp_mib_free() is changed to add a size parameter since cpu_fre requires that. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/net/ip.h|2 +- include/net/snmp.h | 15

[rfc 28/45] cpu_alloc: convert network sockets

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- net/core/sock.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/net/core/sock.c === --- linux-2.6.orig/net/core/sock.c 2007-11-18

[rfc 29/45] cpu alloc: Use for infiniband

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_irq.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) Index: linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c ===

[rfc 30/45] cpu alloc: Use in the crypto subsystem.

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- crypto/async_tx/async_tx.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) Index: linux-2.6/crypto/async_tx/async_tx.c === ---

[rfc 31/45] cpu alloc: Remove the allocpercpu functionality

2007-11-19 Thread clameter
There is no user of allocpercpu left after all the earlier patches were applied. Remove the code that realizes allocpercpu. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/percpu.h | 80 -- mm/Makefile|1 mm/allocpercpu.c

[rfc 33/45] x86_64: Use CPU ops for nmi alert counter

2007-11-19 Thread clameter
These are critical fast paths. Using a segment override instead of an address calculation is reducing overhead. Signed-off-by: Christoph LAmeter [EMAIL PROTECTED] --- arch/x86/kernel/nmi_64.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index:

[rfc 34/45] x86_64: Fold percpu area into the cpu area.

2007-11-19 Thread clameter
Use boot_cpu_alloc to allocate a cpu area chunk that is needed to store the statically declared per cpu data and then point the per_cpu_offset pointers to the cpu area. The per cpu area is moved to a ZERO offset using some linker scripting. All per cpu variable addresses become true offsets into

[rfc 35/45] X86_64: Declare pda as per cpu data thereby moving it into the cpu area

2007-11-19 Thread clameter
Declare the pda as a per cpu variable. This will have the effect of moving the pda data into the cpu area managed by cpu alloc. The boot_pdas are only needed in head64.c so move the declaration over there and make it static. Remove the code that allocates special pda data structures.

[rfc 36/45] X86_64: Place pda first in cpu area.

2007-11-19 Thread clameter
If we move the pda to the beginning of the cpu area then the gs segment will also point to the beginning of the cpu area. After this patch we can use gs on any percpu variable or cpu_alloc pointer from cpu 0 to get to the active processors variable. There is no need anymore to add a per cpu offset

[rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread clameter
Support fast cpu ops in x86_64 by providing a series of functions that generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code can exploit the availability of fast per cpu operations. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/Kconfig|

[rfc 38/45] x86_64: Remove obsolete per_cpu offset calculations

2007-11-19 Thread clameter
Replace all uses of __per_cpu_offset with CPU_PTR. This will avoid a lot of lookups for per cpu offset calculations. Keep per_cpu_offset() itself because lockdep uses it. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/kernel/smpboot_64.c |8 +++-

[rfc 39/45] x86_64: Remove the data_offset field from the pda.

2007-11-19 Thread clameter
It is useless now since gs can always stand in for data_offset. Move active_mm into the available slot in order to not upset the established offsets. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/kernel/asm-offsets_64.c |1 - arch/x86/kernel/entry_64.S |7

[rfc 40/45] x86_64: Provide per_cpu_var definition

2007-11-19 Thread clameter
There needs to be a way to determine the offset for the CPU ops of per cpu variables. The offset is simply the address of the variable. But we do not want to code ugly things like CPU_READ(per_cpu__statistics) in the core. So define a new helper per_cpu_var(var) that simply adds the

[rfc 41/45] VM statistics: Use CPU ops

2007-11-19 Thread clameter
The use of CPU ops here avoids the offset calculations that we used to have to do with per cpu ops. The result of this patch is that event counters are coded with a single instruction the following way: incq %gs:offset(%rip) Without these patches this was: mov%gs:0x8,%rdx mov

[rfc 43/45] x86_64: Add a CPU_OR to support or_pda()

2007-11-19 Thread clameter
Get rid of one of the leftover pda accessors and cut out some more of pda.h. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/asm-x86/pda.h | 35 +-- include/asm-x86/percpu_64.h | 30 ++ 2 files changed, 31

[rfc 32/45] Module handling: Use CPU_xx ops to dynamically allocate counters

2007-11-19 Thread clameter
Use the CPU_xx operations to deal with the per cpu data. Avoid a loop to NR_CPUS here. Use the possible map instead. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/module.h | 13 + kernel/module.c| 17 +++-- 2 files changed, 12

[rfc 44/45] Remove local_t support

2007-11-19 Thread clameter
There is no user of local_t remaining after the cpu ops patchset. local_t always suffered from the problem that the operations it generated were not able to perform the relocation of a pointer to the target processor and the atomic update at the same time. There was a need to disable preemption

[rfc 45/45] Modules: Hack to handle symbols that have a zero value

2007-11-19 Thread clameter
The module subsystem cannot handle symbols that are zero. It prints out a message that these symbols are unresolved. Define a constant UNRESOLVED that is used to hold the value used for unresolved symbols. Set it to 1 (its hopefully unlikely that a symbol will have the value 1). This is

[rfc 21/45] cpu alloc: tcp statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- net/ipv4/tcp.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/net/ipv4/tcp.c === --- linux-2.6.orig/net/ipv4/tcp.c 2007-11-15

[rfc 16/45] cpu alloc: blktrace conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- block/blktrace.c |8 1 file changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/block/blktrace.c === --- linux-2.6.orig/block/blktrace.c 2007-11-15

[rfc 19/45] cpu alloc: NFS statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- fs/nfs/iostat.h |8 fs/nfs/super.c |2 +- 2 files changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/fs/nfs/iostat.h === ---

[rfc 15/45] cpu alloc: genhd statistics conversion

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/genhd.h | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) Index: linux-2.6/include/linux/genhd.h === ---

[rfc 17/45] cpu alloc: SRCU

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- kernel/rcutorture.c |4 ++-- kernel/srcu.c | 20 2 files changed, 10 insertions(+), 14 deletions(-) Index: linux-2.6/kernel/rcutorture.c === ---

[rfc 18/45] cpu alloc: XFS counters

2007-11-19 Thread clameter
Also remove the useless zeroing after allocation. Allocpercpu already zeroed the objects. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- fs/xfs/xfs_mount.c | 24 1 file changed, 8 insertions(+), 16 deletions(-) Index: linux-2.6/fs/xfs/xfs_mount.c

[rfc 20/45] cpu alloc: neigbour statistics

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/net/neighbour.h |6 +- net/core/neighbour.c| 11 ++- 2 files changed, 7 insertions(+), 10 deletions(-) Index: linux-2.6/include/net/neighbour.h ===

[rfc 22/45] cpu alloc: convert scatches

2007-11-19 Thread clameter
Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- net/ipv4/ipcomp.c | 26 +- net/ipv6/ipcomp6.c | 26 +- 2 files changed, 26 insertions(+), 26 deletions(-) Index: linux-2.6/net/ipv4/ipcomp.c

[rfc 23/45] cpu alloc: dmaengine conversion

2007-11-19 Thread clameter
Convert DMA engine to use CPU_xx operations. This also removes the use of local_t from the dmaengine. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/dma/dmaengine.c | 38 ++ include/linux/dmaengine.h | 16 ++-- 2 files

[01/36] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user

2007-08-28 Thread clameter
Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations. zero_user_segment(page, start,

[06/36] Use page_cache_xxx in mm/rmap.c

2007-08-28 Thread clameter
Use page_cache_xxx in mm/rmap.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/rmap.c | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 41ac397..d6a1771 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -188,9 +188,14 @@ static

[05/36] Use page_cache_xxx in mm/truncate.c

2007-08-28 Thread clameter
Use page_cache_xxx in mm/truncate.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/truncate.c | 35 ++- 1 files changed, 18 insertions(+), 17 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index bf8068d..8c3d32e 100644 --- a/mm/truncate.c

[10/36] Use page_cache_xxx in fs/sync

2007-08-28 Thread clameter
Use page_cache_xxx in fs/sync. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/sync.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 7cd005e..f30d7eb 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -260,8 +260,8 @@ int

[12/36] Use page_cache_xxx in mm/mpage.c

2007-08-28 Thread clameter
Use page_cache_xxx in mm/mpage.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/mpage.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/mpage.c b/fs/mpage.c index a5e1385..2843ed7 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@

[09/36] Use page_cache_xxx in fs/libfs.c

2007-08-28 Thread clameter
Use page_cache_xxx in fs/libfs.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/libfs.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index 53b3dc5..e90f894 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -16,7 +16,8 @@ int

[07/36] Use page_cache_xxx in mm/filemap_xip.c

2007-08-28 Thread clameter
Use page_cache_xxx in mm/filemap_xip.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/filemap_xip.c | 28 ++-- 1 files changed, 14 insertions(+), 14 deletions(-) diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c index ba6892d..5237e53 100644 ---

[15/36] Use page_cache_xxx functions in fs/ext2

2007-08-28 Thread clameter
Use page_cache_xxx functions in fs/ext2 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/ext2/dir.c | 40 +++- 1 files changed, 23 insertions(+), 17 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 2bf49d7..d72926f 100644 ---

[19/36] Use page_cache_xxx for fs/xfs

2007-08-28 Thread clameter
Use page_cache_xxx for fs/xfs Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/xfs/linux-2.6/xfs_aops.c | 55 ++ fs/xfs/linux-2.6/xfs_lrw.c |6 ++-- 2 files changed, 32 insertions(+), 29 deletions(-) diff --git

[20/36] Use page_cache_xxx in drivers/block/rd.c

2007-08-28 Thread clameter
Use page_cache_xxx in drivers/block/rd.c Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/block/rd.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/rd.c b/drivers/block/rd.c index 65150b5..e148b3b 100644 --- a/drivers/block/rd.c +++

  1   2   3   4   5   >