Re: [rfc 08/45] cpu alloc: x86 support

2007-11-26 Thread John Richard Moser
Andi Kleen wrote: On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: You could in theory move the modules, but then you would need to implement a full PIC dynamic linker for them first and also increase runtime overhead for them because they

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-26 Thread John Richard Moser
Andi Kleen wrote: On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: You could in theory move the modules, but then you would need to implement a full PIC dynamic linker for them first and also increase runtime overhead for them because they

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-21 Thread Christoph Lameter
On Wed, 21 Nov 2007, Andi Kleen wrote: > The whole mapping for all CPUs cannot fit into 2GB of course, but the > reference > linker managed range can. Ok so you favor the solution where we subtract smp_processor_id() << shift? > > The offset relative to %gs cannot be used if you have a loop

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-21 Thread Andi Kleen
> > All you need is a 2MB area (16MB is too large if you really > > want 16k CPUs someday) somewhere in the -2GB or probably better > > in +2GB. Then the linker puts stuff in there and you use > > the offsets for referencing relative to %gs. > > 2MB * 16k = 32GB. Even with 4k cpus we will have

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-21 Thread Andi Kleen
All you need is a 2MB area (16MB is too large if you really want 16k CPUs someday) somewhere in the -2GB or probably better in +2GB. Then the linker puts stuff in there and you use the offsets for referencing relative to %gs. 2MB * 16k = 32GB. Even with 4k cpus we will have 2M * 4k =

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-21 Thread Christoph Lameter
On Wed, 21 Nov 2007, Andi Kleen wrote: The whole mapping for all CPUs cannot fit into 2GB of course, but the reference linker managed range can. Ok so you favor the solution where we subtract smp_processor_id() shift? The offset relative to %gs cannot be used if you have a loop and are

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Wed, 21 Nov 2007, Andi Kleen wrote: > On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote: > > But one can subtract too... > > The linker cannot subtract (unless you add a new relocation types) The compiler knows and emits assembly to compensate. > All you need is a 2MB area

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote: > But one can subtract too... The linker cannot subtract (unless you add a new relocation types) > Hmmm... So the cpu area 0 could be put at > the beginning of the 2GB kernel area and then grow downwards from >

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
But one can subtract too... Hmmm... So the cpu area 0 could be put at the beginning of the 2GB kernel area and then grow downwards from 0x8000. The cost in terms of code is one subtract instruction for each per_cpu() or CPU_PTR() The next thing doward from 0x8000 is the

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Christoph Lameter wrote: > 32bit sign extension for what? Absolute data references? The addressing > that I have seen was IP relative. Thus I thought that the kernel could be > moved lower. Argh. This is all depending on a special gcc option to compile the kernel and that

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, H. Peter Anvin wrote: > But you wouldn't actually *use* this address space. It's just for the linker > to know what address to tag the references with; it gets relocated by gs_base > down into proper kernel space. The linker can stash the initialized reference > copy at any

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: > > > > > Right so I could move the kernel to > > > > #define __PAGE_OFFSET _AC(0x8100, UL) > > #define __START_KERNEL_map_AC(0xfff8, UL) > > That is -31GB unless I'm miscounting. But it needs to be >= -2GB > (31bits) The

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker.

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Andi Kleen wrote: This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker. Hmm, in theory since it is not actually used by

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
> > Right so I could move the kernel to > > #define __PAGE_OFFSET _AC(0x8100, UL) > #define __START_KERNEL_map_AC(0xfff8, UL) That is -31GB unless I'm miscounting. But it needs to be >= -2GB (31bits) Right now it is at -2GB + 2MB, because it is loaded at physical

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: > > > This limitation shouldn't apply to the percpu area, since gs_base can be > > pointed anywhere in the address space -- in effect we're always indirect. > > The initial reference copy of the percpu area has to be addressed by > the linker. Right that

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
> This limitation shouldn't apply to the percpu area, since gs_base can be > pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker. Hmm, in theory since it is not actually used by itself I

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Andi Kleen wrote: On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: I might be pointing out the obvious, but on x86-64 there is definitely not 256TB of VM available for this. Well maybe in the future. That would either require more than 4

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: > > So I think we have a 2GB area right? > > For everything that needs the -31bit offsets; that is everything linked Of course. > > 1GB kernel > > 1GB - 1x per cpu area (128M?) modules? > > cpu aree 0 > > 2GB limit > > cpu area 1 > > cpu area 2 > >

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: > On Tue, 20 Nov 2007, Andi Kleen wrote: > > I might be pointing out the obvious, but on x86-64 there is definitely > > not 256TB of VM available for this. > > Well maybe in the future. That would either require more than 4 levels or

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
> > Yeah yea but the latencies are minimal making the NUMA logic too > > expensive for most loads ... If you put a NUMA kernel onto those then > > performance drops (I think someone measures 15-30%?) > > Small socket count systems are going to increasingly be NUMA in future. > If CONFIG_NUMA

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: I might be pointing out the obvious, but on x86-64 there is definitely not 256TB of VM available for this. Well maybe in the future. That would either require more than 4 levels or larger

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
Yeah yea but the latencies are minimal making the NUMA logic too expensive for most loads ... If you put a NUMA kernel onto those then performance drops (I think someone measures 15-30%?) Small socket count systems are going to increasingly be NUMA in future. If CONFIG_NUMA hurts

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: So I think we have a 2GB area right? For everything that needs the -31bit offsets; that is everything linked Of course. 1GB kernel 1GB - 1x per cpu area (128M?) modules? cpu aree 0 2GB limit cpu area 1 cpu area 2 For that

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Andi Kleen wrote: On Tuesday 20 November 2007 04:50, Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: I might be pointing out the obvious, but on x86-64 there is definitely not 256TB of VM available for this. Well maybe in the future. That would either require more than 4

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker. Hmm, in theory since it is not actually used by itself I suppose

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker. Right that is

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
Right so I could move the kernel to #define __PAGE_OFFSET _AC(0x8100, UL) #define __START_KERNEL_map_AC(0xfff8, UL) That is -31GB unless I'm miscounting. But it needs to be = -2GB (31bits) Right now it is at -2GB + 2MB, because it is loaded at physical +2MB

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Andi Kleen wrote: This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker. Hmm, in theory since it is not actually used by

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: Right so I could move the kernel to #define __PAGE_OFFSET _AC(0x8100, UL) #define __START_KERNEL_map_AC(0xfff8, UL) That is -31GB unless I'm miscounting. But it needs to be = -2GB (31bits) The

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, H. Peter Anvin wrote: But you wouldn't actually *use* this address space. It's just for the linker to know what address to tag the references with; it gets relocated by gs_base down into proper kernel space. The linker can stash the initialized reference copy at any

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread H. Peter Anvin
Christoph Lameter wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: This limitation shouldn't apply to the percpu area, since gs_base can be pointed anywhere in the address space -- in effect we're always indirect. The initial reference copy of the percpu area has to be addressed by the linker.

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Tue, 20 Nov 2007, Christoph Lameter wrote: 32bit sign extension for what? Absolute data references? The addressing that I have seen was IP relative. Thus I thought that the kernel could be moved lower. Argh. This is all depending on a special gcc option to compile the kernel and that

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
But one can subtract too... Hmmm... So the cpu area 0 could be put at the beginning of the 2GB kernel area and then grow downwards from 0x8000. The cost in terms of code is one subtract instruction for each per_cpu() or CPU_PTR() The next thing doward from 0x8000 is the

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Andi Kleen
On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote: But one can subtract too... The linker cannot subtract (unless you add a new relocation types) Hmmm... So the cpu area 0 could be put at the beginning of the 2GB kernel area and then grow downwards from 0x8000.

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-20 Thread Christoph Lameter
On Wed, 21 Nov 2007, Andi Kleen wrote: On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote: But one can subtract too... The linker cannot subtract (unless you add a new relocation types) The compiler knows and emits assembly to compensate. All you need is a 2MB area (16MB is

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Nick Piggin
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote: > On Mon, 19 Nov 2007, H. Peter Anvin wrote: > > You're making the assumption here that NUMA = large number of CPUs. This > > assumption is flat-out wrong. > > Well maybe. Usually one gets to NUMA because the hardware gets too big to > be

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: > I might be pointing out the obvious, but on x86-64 there is definitely not > 256TB of VM available for this. Well maybe in the future. One of the issues that I ran into is that I had to place the cpu area in between to make the offsets link right.

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Nick Piggin
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote: > On Mon, 19 Nov 2007, H. Peter Anvin wrote: > > You're making the assumption here that NUMA = large number of CPUs. This > > assumption is flat-out wrong. > > Well maybe. Usually one gets to NUMA because the hardware gets too big to > be

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Andi Kleen
> 4k cpu configurations with 1k nodes: > > 4096 * 16MB = 64TB of virtual space. > > Maximum theoretical configuration 16384 processors 1k nodes: > > 16384 * 16MB = 256TB of virtual space. > > Both fit within the established limits established. I might be pointing out the obvious, but

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread H. Peter Anvin
Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: You're making the assumption here that NUMA = large number of CPUs. This assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be handleed the UMA way. On x86-64, most

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Christoph Lameter
On Mon, 19 Nov 2007, H. Peter Anvin wrote: > You're making the assumption here that NUMA = large number of CPUs. This > assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be handleed the UMA way. > On x86-64, most two-socket systems are

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread H. Peter Anvin
Christoph Lameter wrote: For the UP and SMP case map the area using 4k ptes. Typical use of per cpu data is around 16k for UP and SMP configurations. It goes up to 45k when the per cpu area is managed by cpu_alloc (see special x86_64 patchset). Allocating in 2M segments would be overkill. For

[rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread clameter
64 bit: Set up a cpu area that allows the use of up 16MB for each processor. Cpu memory use can grow a bit. F.e. if we assume that a pageset occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per processor. This

[rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread clameter
64 bit: Set up a cpu area that allows the use of up 16MB for each processor. Cpu memory use can grow a bit. F.e. if we assume that a pageset occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per processor. This

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread H. Peter Anvin
Christoph Lameter wrote: For the UP and SMP case map the area using 4k ptes. Typical use of per cpu data is around 16k for UP and SMP configurations. It goes up to 45k when the per cpu area is managed by cpu_alloc (see special x86_64 patchset). Allocating in 2M segments would be overkill. For

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Christoph Lameter
On Mon, 19 Nov 2007, H. Peter Anvin wrote: You're making the assumption here that NUMA = large number of CPUs. This assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be handleed the UMA way. On x86-64, most two-socket systems are still

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread H. Peter Anvin
Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: You're making the assumption here that NUMA = large number of CPUs. This assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be handleed the UMA way. On x86-64, most

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Andi Kleen
4k cpu configurations with 1k nodes: 4096 * 16MB = 64TB of virtual space. Maximum theoretical configuration 16384 processors 1k nodes: 16384 * 16MB = 256TB of virtual space. Both fit within the established limits established. I might be pointing out the obvious, but on

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Nick Piggin
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: You're making the assumption here that NUMA = large number of CPUs. This assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Nick Piggin
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: You're making the assumption here that NUMA = large number of CPUs. This assumption is flat-out wrong. Well maybe. Usually one gets to NUMA because the hardware gets too big to be

Re: [rfc 08/45] cpu alloc: x86 support

2007-11-19 Thread Christoph Lameter
On Tue, 20 Nov 2007, Andi Kleen wrote: I might be pointing out the obvious, but on x86-64 there is definitely not 256TB of VM available for this. Well maybe in the future. One of the issues that I ran into is that I had to place the cpu area in between to make the offsets link right.