Andi Kleen wrote:
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
You could in theory move the modules, but then you would need to implement
a full PIC dynamic linker for them first and also increase runtime overhead
for them because they
Andi Kleen wrote:
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
You could in theory move the modules, but then you would need to implement
a full PIC dynamic linker for them first and also increase runtime overhead
for them because they
On Wed, 21 Nov 2007, Andi Kleen wrote:
> The whole mapping for all CPUs cannot fit into 2GB of course, but the
> reference
> linker managed range can.
Ok so you favor the solution where we subtract smp_processor_id() <<
shift?
> > The offset relative to %gs cannot be used if you have a loop
> > All you need is a 2MB area (16MB is too large if you really
> > want 16k CPUs someday) somewhere in the -2GB or probably better
> > in +2GB. Then the linker puts stuff in there and you use
> > the offsets for referencing relative to %gs.
>
> 2MB * 16k = 32GB. Even with 4k cpus we will have
All you need is a 2MB area (16MB is too large if you really
want 16k CPUs someday) somewhere in the -2GB or probably better
in +2GB. Then the linker puts stuff in there and you use
the offsets for referencing relative to %gs.
2MB * 16k = 32GB. Even with 4k cpus we will have 2M * 4k =
On Wed, 21 Nov 2007, Andi Kleen wrote:
The whole mapping for all CPUs cannot fit into 2GB of course, but the
reference
linker managed range can.
Ok so you favor the solution where we subtract smp_processor_id()
shift?
The offset relative to %gs cannot be used if you have a loop and are
On Wed, 21 Nov 2007, Andi Kleen wrote:
> On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote:
> > But one can subtract too...
>
> The linker cannot subtract (unless you add a new relocation types)
The compiler knows and emits assembly to compensate.
> All you need is a 2MB area
On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote:
> But one can subtract too...
The linker cannot subtract (unless you add a new relocation types)
> Hmmm... So the cpu area 0 could be put at
> the beginning of the 2GB kernel area and then grow downwards from
>
But one can subtract too... Hmmm... So the cpu area 0 could be put at
the beginning of the 2GB kernel area and then grow downwards from
0x8000. The cost in terms of code is one subtract
instruction for each per_cpu() or CPU_PTR()
The next thing doward from 0x8000 is the
On Tue, 20 Nov 2007, Christoph Lameter wrote:
> 32bit sign extension for what? Absolute data references? The addressing
> that I have seen was IP relative. Thus I thought that the kernel could be
> moved lower.
Argh. This is all depending on a special gcc option to compile the
kernel and that
On Tue, 20 Nov 2007, H. Peter Anvin wrote:
> But you wouldn't actually *use* this address space. It's just for the linker
> to know what address to tag the references with; it gets relocated by gs_base
> down into proper kernel space. The linker can stash the initialized reference
> copy at any
On Tue, 20 Nov 2007, Andi Kleen wrote:
>
> >
> > Right so I could move the kernel to
> >
> > #define __PAGE_OFFSET _AC(0x8100, UL)
> > #define __START_KERNEL_map_AC(0xfff8, UL)
>
> That is -31GB unless I'm miscounting. But it needs to be >= -2GB
> (31bits)
The
Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Andi Kleen wrote:
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Hmm, in theory since it is not actually used by
>
> Right so I could move the kernel to
>
> #define __PAGE_OFFSET _AC(0x8100, UL)
> #define __START_KERNEL_map_AC(0xfff8, UL)
That is -31GB unless I'm miscounting. But it needs to be >= -2GB
(31bits)
Right now it is at -2GB + 2MB, because it is loaded at physical
On Tue, 20 Nov 2007, Andi Kleen wrote:
>
> > This limitation shouldn't apply to the percpu area, since gs_base can be
> > pointed anywhere in the address space -- in effect we're always indirect.
>
> The initial reference copy of the percpu area has to be addressed by
> the linker.
Right that
> This limitation shouldn't apply to the percpu area, since gs_base can be
> pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Hmm, in theory since it is not actually used by itself I
Andi Kleen wrote:
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
I might be pointing out the obvious, but on x86-64 there is definitely
not 256TB of VM available for this.
Well maybe in the future.
That would either require more than 4
On Tue, 20 Nov 2007, Andi Kleen wrote:
> > So I think we have a 2GB area right?
>
> For everything that needs the -31bit offsets; that is everything linked
Of course.
> > 1GB kernel
> > 1GB - 1x per cpu area (128M?) modules?
> > cpu aree 0
> > 2GB limit
> > cpu area 1
> > cpu area 2
> >
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
> On Tue, 20 Nov 2007, Andi Kleen wrote:
> > I might be pointing out the obvious, but on x86-64 there is definitely
> > not 256TB of VM available for this.
>
> Well maybe in the future.
That would either require more than 4 levels or
> > Yeah yea but the latencies are minimal making the NUMA logic too
> > expensive for most loads ... If you put a NUMA kernel onto those then
> > performance drops (I think someone measures 15-30%?)
>
> Small socket count systems are going to increasingly be NUMA in future.
> If CONFIG_NUMA
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
I might be pointing out the obvious, but on x86-64 there is definitely
not 256TB of VM available for this.
Well maybe in the future.
That would either require more than 4 levels or larger
Yeah yea but the latencies are minimal making the NUMA logic too
expensive for most loads ... If you put a NUMA kernel onto those then
performance drops (I think someone measures 15-30%?)
Small socket count systems are going to increasingly be NUMA in future.
If CONFIG_NUMA hurts
On Tue, 20 Nov 2007, Andi Kleen wrote:
So I think we have a 2GB area right?
For everything that needs the -31bit offsets; that is everything linked
Of course.
1GB kernel
1GB - 1x per cpu area (128M?) modules?
cpu aree 0
2GB limit
cpu area 1
cpu area 2
For that
Andi Kleen wrote:
On Tuesday 20 November 2007 04:50, Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
I might be pointing out the obvious, but on x86-64 there is definitely
not 256TB of VM available for this.
Well maybe in the future.
That would either require more than 4
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Hmm, in theory since it is not actually used by itself I suppose
On Tue, 20 Nov 2007, Andi Kleen wrote:
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Right that is
Right so I could move the kernel to
#define __PAGE_OFFSET _AC(0x8100, UL)
#define __START_KERNEL_map_AC(0xfff8, UL)
That is -31GB unless I'm miscounting. But it needs to be = -2GB
(31bits)
Right now it is at -2GB + 2MB, because it is loaded at physical +2MB
Andi Kleen wrote:
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
Hmm, in theory since it is not actually used by
On Tue, 20 Nov 2007, Andi Kleen wrote:
Right so I could move the kernel to
#define __PAGE_OFFSET _AC(0x8100, UL)
#define __START_KERNEL_map_AC(0xfff8, UL)
That is -31GB unless I'm miscounting. But it needs to be = -2GB
(31bits)
The
On Tue, 20 Nov 2007, H. Peter Anvin wrote:
But you wouldn't actually *use* this address space. It's just for the linker
to know what address to tag the references with; it gets relocated by gs_base
down into proper kernel space. The linker can stash the initialized reference
copy at any
Christoph Lameter wrote:
On Tue, 20 Nov 2007, Andi Kleen wrote:
This limitation shouldn't apply to the percpu area, since gs_base can be
pointed anywhere in the address space -- in effect we're always indirect.
The initial reference copy of the percpu area has to be addressed by
the linker.
On Tue, 20 Nov 2007, Christoph Lameter wrote:
32bit sign extension for what? Absolute data references? The addressing
that I have seen was IP relative. Thus I thought that the kernel could be
moved lower.
Argh. This is all depending on a special gcc option to compile the
kernel and that
But one can subtract too... Hmmm... So the cpu area 0 could be put at
the beginning of the 2GB kernel area and then grow downwards from
0x8000. The cost in terms of code is one subtract
instruction for each per_cpu() or CPU_PTR()
The next thing doward from 0x8000 is the
On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote:
But one can subtract too...
The linker cannot subtract (unless you add a new relocation types)
Hmmm... So the cpu area 0 could be put at
the beginning of the 2GB kernel area and then grow downwards from
0x8000.
On Wed, 21 Nov 2007, Andi Kleen wrote:
On Wednesday 21 November 2007 02:16:11 Christoph Lameter wrote:
But one can subtract too...
The linker cannot subtract (unless you add a new relocation types)
The compiler knows and emits assembly to compensate.
All you need is a 2MB area (16MB is
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote:
> On Mon, 19 Nov 2007, H. Peter Anvin wrote:
> > You're making the assumption here that NUMA = large number of CPUs. This
> > assumption is flat-out wrong.
>
> Well maybe. Usually one gets to NUMA because the hardware gets too big to
> be
On Tue, 20 Nov 2007, Andi Kleen wrote:
> I might be pointing out the obvious, but on x86-64 there is definitely not
> 256TB of VM available for this.
Well maybe in the future.
One of the issues that I ran into is that I had to place the cpu area
in between to make the offsets link right.
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote:
> On Mon, 19 Nov 2007, H. Peter Anvin wrote:
> > You're making the assumption here that NUMA = large number of CPUs. This
> > assumption is flat-out wrong.
>
> Well maybe. Usually one gets to NUMA because the hardware gets too big to
> be
> 4k cpu configurations with 1k nodes:
>
> 4096 * 16MB = 64TB of virtual space.
>
> Maximum theoretical configuration 16384 processors 1k nodes:
>
> 16384 * 16MB = 256TB of virtual space.
>
> Both fit within the established limits established.
I might be pointing out the obvious, but
Christoph Lameter wrote:
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
You're making the assumption here that NUMA = large number of CPUs. This
assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be handleed the UMA way.
On x86-64, most
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
> You're making the assumption here that NUMA = large number of CPUs. This
> assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be handleed the UMA way.
> On x86-64, most two-socket systems are
Christoph Lameter wrote:
For the UP and SMP case map the area using 4k ptes. Typical use of per cpu
data is around 16k for UP and SMP configurations. It goes up to 45k when the
per cpu area is managed by cpu_alloc (see special x86_64 patchset).
Allocating in 2M segments would be overkill.
For
64 bit:
Set up a cpu area that allows the use of up 16MB for each processor.
Cpu memory use can grow a bit. F.e. if we assume that a pageset
occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes
then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per
processor. This
64 bit:
Set up a cpu area that allows the use of up 16MB for each processor.
Cpu memory use can grow a bit. F.e. if we assume that a pageset
occupies 64 bytes of memory and we have 3 zones in each of 1024 nodes
then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per
processor. This
Christoph Lameter wrote:
For the UP and SMP case map the area using 4k ptes. Typical use of per cpu
data is around 16k for UP and SMP configurations. It goes up to 45k when the
per cpu area is managed by cpu_alloc (see special x86_64 patchset).
Allocating in 2M segments would be overkill.
For
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
You're making the assumption here that NUMA = large number of CPUs. This
assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be handleed the UMA way.
On x86-64, most two-socket systems are still
Christoph Lameter wrote:
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
You're making the assumption here that NUMA = large number of CPUs. This
assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be handleed the UMA way.
On x86-64, most
4k cpu configurations with 1k nodes:
4096 * 16MB = 64TB of virtual space.
Maximum theoretical configuration 16384 processors 1k nodes:
16384 * 16MB = 256TB of virtual space.
Both fit within the established limits established.
I might be pointing out the obvious, but on
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote:
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
You're making the assumption here that NUMA = large number of CPUs. This
assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be
On Tuesday 20 November 2007 13:02, Christoph Lameter wrote:
On Mon, 19 Nov 2007, H. Peter Anvin wrote:
You're making the assumption here that NUMA = large number of CPUs. This
assumption is flat-out wrong.
Well maybe. Usually one gets to NUMA because the hardware gets too big to
be
On Tue, 20 Nov 2007, Andi Kleen wrote:
I might be pointing out the obvious, but on x86-64 there is definitely not
256TB of VM available for this.
Well maybe in the future.
One of the issues that I ran into is that I had to place the cpu area
in between to make the offsets link right.
52 matches
Mail list logo