Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
The answer is "no", I don't have root access, but I suspect that that would be 
the right fix if it is currently set to [always] and either madvise or never 
would be good options. If it is of interest, I'll ask someone to try it and 
report back on what happens.

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 15:39
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Only the one in brackets is set, others are unset alternatives.

If you write "madvise" in that file, it'll become "always [madvise] never".

Brice


Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit :
> On the 8 numa node machine
>
> $cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
>
> is set already, so I'm not really sure what should go in there to disable it.
>
> JB
>
> -Original Message-
> From: Brice Goglin 
> Sent: 29 January 2019 15:29
> To: Biddiscombe, John A. ; Hardware locality user 
> list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Oh, that's very good to know. I guess lots of people using first touch will 
> be affected by this issue. We may want to add a hwloc memory flag doing 
> something similar.
>
> Do you have root access to verify that writing "never" or "madvise" in 
> /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?
>
> Brice
>
>
>
> Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
>> Brice
>>
>> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>>
>> seems to make things behave much more sensibly. I had no idea it was a 
>> thing, but one of my colleagues pointed me to it.
>>
>> Problem seems to be solved for now. Thank you very much for your insights 
>> and suggestions/help.
>>
>> JB
>>
>> -Original Message-
>> From: Brice Goglin 
>> Sent: 29 January 2019 10:35
>> To: Biddiscombe, John A. ; Hardware locality user 
>> list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
>> You're not requesting huge pages in your allocation but some systems 
>> have transparent huge pages enabled by default (e.g. RHEL
>> https://access.redhat.com/solutions/46111)
>>
>> This could explain why 512 pages get allocated on the same node, but it 
>> wouldn't explain crazy patterns you've seen in the past.
>>
>> Brice
>>
>>
>>
>>
>> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>>> array of bytes and touch pages in a linear fashion.
>>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>>> each page in the data.
>>>
>>> When I allocate 511 pages and touch alternate pages on alternate 
>>> numa nodes
>>>
>>> Numa page binding 511
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>>
>>> but as soon as I increase to 512 pages, it breaks.
>>>
>>> Numa page binding 512
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Only the one in brackets is set, others are unset alternatives.

If you write "madvise" in that file, it'll become "always [madvise] never".

Brice


Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit :
> On the 8 numa node machine
>
> $cat /sys/kernel/mm/transparent_hugepage/enabled 
> [always] madvise never
>
> is set already, so I'm not really sure what should go in there to disable it.
>
> JB
>
> -Original Message-
> From: Brice Goglin  
> Sent: 29 January 2019 15:29
> To: Biddiscombe, John A. ; Hardware locality user list 
> 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Oh, that's very good to know. I guess lots of people using first touch will 
> be affected by this issue. We may want to add a hwloc memory flag doing 
> something similar.
>
> Do you have root access to verify that writing "never" or "madvise" in 
> /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?
>
> Brice
>
>
>
> Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
>> Brice
>>
>> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>>
>> seems to make things behave much more sensibly. I had no idea it was a 
>> thing, but one of my colleagues pointed me to it.
>>
>> Problem seems to be solved for now. Thank you very much for your insights 
>> and suggestions/help.
>>
>> JB
>>
>> -----Original Message-
>> From: Brice Goglin 
>> Sent: 29 January 2019 10:35
>> To: Biddiscombe, John A. ; Hardware locality user 
>> list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
>> You're not requesting huge pages in your allocation but some systems 
>> have transparent huge pages enabled by default (e.g. RHEL
>> https://access.redhat.com/solutions/46111)
>>
>> This could explain why 512 pages get allocated on the same node, but it 
>> wouldn't explain crazy patterns you've seen in the past.
>>
>> Brice
>>
>>
>>
>>
>> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>>> array of bytes and touch pages in a linear fashion.
>>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>>> each page in the data.
>>>
>>> When I allocate 511 pages and touch alternate pages on alternate numa 
>>> nodes
>>>
>>> Numa page binding 511
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>>
>>> but as soon as I increase to 512 pages, it breaks.
>>>
>>> Numa page binding 512
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
On the 8 numa node machine

$cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

is set already, so I'm not really sure what should go in there to disable it.

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 15:29
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Oh, that's very good to know. I guess lots of people using first touch will be 
affected by this issue. We may want to add a hwloc memory flag doing something 
similar.

Do you have root access to verify that writing "never" or "madvise" in 
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?

Brice



Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
> Brice
>
> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>
> seems to make things behave much more sensibly. I had no idea it was a thing, 
> but one of my colleagues pointed me to it.
>
> Problem seems to be solved for now. Thank you very much for your insights and 
> suggestions/help.
>
> JB
>
> -Original Message-
> From: Brice Goglin 
> Sent: 29 January 2019 10:35
> To: Biddiscombe, John A. ; Hardware locality user 
> list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
> You're not requesting huge pages in your allocation but some systems 
> have transparent huge pages enabled by default (e.g. RHEL
> https://access.redhat.com/solutions/46111)
>
> This could explain why 512 pages get allocated on the same node, but it 
> wouldn't explain crazy patterns you've seen in the past.
>
> Brice
>
>
>
>
> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>> array of bytes and touch pages in a linear fashion.
>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>> each page in the data.
>>
>> When I allocate 511 pages and touch alternate pages on alternate numa 
>> nodes
>>
>> Numa page binding 511
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>
>> but as soon as I increase to 512 pages, it breaks.
>>
>> Numa page binding 512
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> On the 8 numa node machine it sometimes gives the right answer even with 512 
>> pages.
>>
>> Still baffled
>>
>> JB
>>
>> -Original Message-
>> From: hwloc-users  On Behalf Of 
>> Biddiscombe, John A.
>> Sent: 28 January 2019 16:14
>> To: Brice 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Oh, that's very good to know. I guess lots of people using first touch
will be affected by this issue. We may want to add a hwloc memory flag
doing something similar.

Do you have root access to verify that writing "never" or "madvise" in
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?

Brice



Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
> Brice
>
> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>
> seems to make things behave much more sensibly. I had no idea it was a thing, 
> but one of my colleagues pointed me to it.
>
> Problem seems to be solved for now. Thank you very much for your insights and 
> suggestions/help.
>
> JB
>
> -Original Message-
> From: Brice Goglin  
> Sent: 29 January 2019 10:35
> To: Biddiscombe, John A. ; Hardware locality user list 
> 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
> You're not requesting huge pages in your allocation but some systems have 
> transparent huge pages enabled by default (e.g. RHEL
> https://access.redhat.com/solutions/46111)
>
> This could explain why 512 pages get allocated on the same node, but it 
> wouldn't explain crazy patterns you've seen in the past.
>
> Brice
>
>
>
>
> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>> array of bytes and touch pages in a linear fashion.
>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>> each page in the data.
>>
>> When I allocate 511 pages and touch alternate pages on alternate numa 
>> nodes
>>
>> Numa page binding 511
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>
>> but as soon as I increase to 512 pages, it breaks.
>>
>> Numa page binding 512
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> On the 8 numa node machine it sometimes gives the right answer even with 512 
>> pages.
>>
>> Still baffled
>>
>> JB
>>
>> -Original Message-
>> From: hwloc-users  On Behalf Of 
>> Biddiscombe, John A.
>> Sent: 28 January 2019 16:14
>> To: Brice Goglin 
>> Cc: Hardware locality user list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Brice
>>
>>> Can you print the pattern before and after thread 1 touched its pages, or 
>>> even in the middle ?
>>> It looks like somebody is touching too many pages here.
>> Expe

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
Brice

madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)

seems to make things behave much more sensibly. I had no idea it was a thing, 
but one of my colleagues pointed me to it.

Problem seems to be solved for now. Thank you very much for your insights and 
suggestions/help.

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 10:35
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems have 
transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it 
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa 
> nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only 
> allowing thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 26 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 63 0 0 0 0 0 0 0
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did 
> core 63 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
I wondered something similar. The crazy patterns usually happen on columns of 
the 2D matrix and as it is column major, it does loosely fit the idea (most of 
the time).

I will play some more (though I'm fed up with it now).

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 10:35
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems have 
transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it 
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa 
> nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only 
> allowing thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 26 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 63 0 0 0 0 0 0 0
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did 
> core 63 (domain 8)
>
> Now I run it a second time 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems
have transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only allowing 
> thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 26 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 63 0 0 0 0 0 0 0 
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 
> (domain 8)
>
> Now I run it a second time and look, its rubbish
>
> get_numa_domain() 8 Domain Numa pattern
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 26 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
I simplified things and instead of writing to a 2D array, I allocate a 1D array 
of bytes and touch pages in a linear fashion.
Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
page in the data.

When I allocate 511 pages and touch alternate pages on alternate numa nodes

Numa page binding 511
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
 1 0 1 0 1 0 1 0 1 0 1 0

but as soon as I increase to 512 pages, it breaks.

Numa page binding 512
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0

On the 8 numa node machine it sometimes gives the right answer even with 512 
pages.

Still baffled

JB

-Original Message-
From: hwloc-users  On Behalf Of 
Biddiscombe, John A.
Sent: 28 January 2019 16:14
To: Brice Goglin 
Cc: Hardware locality user list 
Subject: Re: [hwloc-users] unusual memory binding results

Brice

>Can you print the pattern before and after thread 1 touched its pages, or even 
>in the middle ?
>It looks like somebody is touching too many pages here.

Experimenting with different threads touching one or more pages, I get 
unpredicatable results

here on the 8 numa node device, the result is perfect. I am only allowing 
thread 3 and 7 to write a single memory location

get_numa_domain() 8 Domain Numa pattern



3---



7---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 
(domain 8)

Now I run it a second time and look, its rubbish

get_numa_domain() 8 Domain Numa pattern
3---
3---
3---
3---
3---
3---
3---
3---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


after allowing the data to be read by a random thread

3777
3777
3777
3777
3777
3777
3777
3777

I'm baffled.

JB

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] unusual memory binding results

2019-01-28 Thread Biddiscombe, John A.
Brice

>Can you print the pattern before and after thread 1 touched its pages, or even 
>in the middle ?
>It looks like somebody is touching too many pages here.

Experimenting with different threads touching one or more pages, I get 
unpredicatable results

here on the 8 numa node device, the result is perfect. I am only allowing 
thread 3 and 7 to write a single memory location

get_numa_domain() 8 Domain Numa pattern



3---



7---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 
(domain 8)

Now I run it a second time and look, its rubbish

get_numa_domain() 8 Domain Numa pattern
3---
3---
3---
3---
3---
3---
3---
3---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


after allowing the data to be read by a random thread

3777
3777
3777
3777
3777
3777
3777
3777

I'm baffled.

JB

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] unusual memory binding results

2019-01-28 Thread Brice Goglin
Le 28/01/2019 à 11:28, Biddiscombe, John A. a écrit :
> If I disable thread 0 and allow thread 1 then I get this pattern on 1 machine 
> (clearly wrong)
> 
> 
> 
> 
> 


Can you print the pattern before and after thread 1 touched its pages,
or even in the middle ?

It looks like somebody is touching too many pages here.

Brice


> and on another I get
> -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
> 1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
> -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
> 1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
> -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
> 1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
> which is correct because the '-' is a negative status. I will run again and 
> see if it's -14 or -2
>
> JB
>
>
> -Original Message-
> From: Brice Goglin  
> Sent: 28 January 2019 10:56
> To: Biddiscombe, John A. 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Can you try again disabling the touching in one thread to check whether the 
> other thread only touched its own pages? (others' status should be
> -2 (ENOENT))
>
> Recent kernels have ways to migrate memory at runtime
> (CONFIG_NUMA_BALANCING) but this should only occur when it detects that some 
> thread does a lot of remote access, which shouldn't be the case here, at 
> least at the beginning of the program.
>
> Brice
>
>
>
> Le 28/01/2019 à 10:35, Biddiscombe, John A. a écrit :
>> Brice
>>
>> I might have been using the wrong params to hwloc_get_area_memlocation 
>> in my original version, but I bypassed it and have been calling
>>
>> int get_numa_domain(void *page)
>> {
>> HPX_ASSERT( (std::size_t(page) & 4095) ==0 );
>>
>> void *pages[1] = { page };
>> int  status[1] = { -1 };
>> if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 
>> 0) {
>> if (status[0]>=0 && 
>> status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) {
>> return status[0];
>> }
>> return -1;
>> }
>> throw std::runtime_error("Failed to get numa node for page");
>> }
>>
>> this function instead. Just testing one page address at a time. I 
>> still see this kind of pattern
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> 00101101010010101001010101011010011011010101110101110111010101
>> 010101
>> when I should see
>> 0101010101010101010101010101010101010101010101010101010101010101010101
>> 0101010101
>> 1010101010101010101010101010101010101010101010101010101010101010101010
>> 1010101010
>> 0101010101010101010101010101010101010101010101010101010101010101010101
>> 0101010101
>> 1010101010101010101010101010101010101010101010101010101010101010101010
>> 1010101010
>> 0101010101010101010101010101010101010101010101010101010101010101010101
>> 0101010101
>> 1010101010101010101010101010101010101010101010101010101010101010101010
>> 1010101010
>> 0101010101010101010101010101010101010101010101010101010101010

Re: [hwloc-users] unusual memory binding results

2019-01-28 Thread Brice Goglin
Can you try again disabling the touching in one thread to check whether
the other thread only touched its own pages? (others' status should be
-2 (ENOENT))

Recent kernels have ways to migrate memory at runtime
(CONFIG_NUMA_BALANCING) but this should only occur when it detects that
some thread does a lot of remote access, which shouldn't be the case
here, at least at the beginning of the program.

Brice



Le 28/01/2019 à 10:35, Biddiscombe, John A. a écrit :
> Brice
>
> I might have been using the wrong params to hwloc_get_area_memlocation in my 
> original version, but I bypassed it and have been calling
>
> int get_numa_domain(void *page)
> {
> HPX_ASSERT( (std::size_t(page) & 4095) ==0 );
>
> void *pages[1] = { page };
> int  status[1] = { -1 };
> if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 
> 0) {
> if (status[0]>=0 && 
> status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) {
> return status[0];
> }
> return -1;
> }
> throw std::runtime_error("Failed to get numa node for page");
> }
>
> this function instead. Just testing one page address at a time. I still see 
> this kind of pattern
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> 00101101010010101001010101011010011011010101110101110111010101010101
> when I should see
> 01010101010101010101010101010101010101010101010101010101010101010101010101010101
> 10101010101010101010101010101010101010101010101010101010101010101010101010101010
> 01010101010101010101010101010101010101010101010101010101010101010101010101010101
> 10101010101010101010101010101010101010101010101010101010101010101010101010101010
> 01010101010101010101010101010101010101010101010101010101010101010101010101010101
> 10101010101010101010101010101010101010101010101010101010101010101010101010101010
> 01010101010101010101010101010101010101010101010101010101010101010101010101010101
> 10101010101010101010101010101010101010101010101010101010101010101010101010101010
> 01010101010101010101010101010101010101010101010101010101010101010101010101010101
> 10101010101010101010101010101010101010101010101010101010101010101010101010101010
>
> I am deeply troubled by this and can't think of what to try next since I can 
> see the memory contents hold the correct CPU ID of the thread that touched 
> the memory, so either the syscall is wrong, or the kernel is doing something 
> else. I welcome any suggestions on what might be wrong.
>
> Thanks for trying to help.
>
> JB
>
> -----Original Message-
> From: Brice Goglin  
> Sent: 26 January 2019 10:19
> To: Biddiscombe, John A. 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Le 25/01/2019 à 23:16, Biddiscombe, John A. a écrit :
>>> move_pages() returning 0 with -14 in the status array? As opposed to 
>>> move_pages() returning -1 with errno set to 14, which would definitely be a 
>>> bug in hwloc.
>> I think it was move_pages returning zero with -14 in the status array, and 
>> then hwloc returning 0 with an empty nodeset (which I then messed up by 
>> calling get bitmap first and assuming 0 meant numa node zero and not 
>> checking for an empty nodeset).
>>
>> I'm not sure why I get -EFAULT status rather than -NOENT, but that's what 
>> I'm seeing in the status field when I pass the pointer returned from the 
>> alloc_membind call.
> The only reason I see for getting -EFAULT there would be that you pass the 
> buffer to move_pages (what hwloc_get_area_memlocation() wants, a start 
> pointer and length) instead of a pointer to an array of page addresses 
> (move_pages wants a void** pointing to individual pages).
>
> Brice
>
>
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] unusual memory binding results

2019-01-28 Thread Biddiscombe, John A.
Brice

I might have been using the wrong params to hwloc_get_area_memlocation in my 
original version, but I bypassed it and have been calling

int get_numa_domain(void *page)
{
HPX_ASSERT( (std::size_t(page) & 4095) ==0 );

void *pages[1] = { page };
int  status[1] = { -1 };
if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) 
{
if (status[0]>=0 && status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) {
return status[0];
}
return -1;
}
throw std::runtime_error("Failed to get numa node for page");
}

this function instead. Just testing one page address at a time. I still see 
this kind of pattern
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
00101101010010101001010101011010011011010101110101110111010101010101
when I should see
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010

I am deeply troubled by this and can't think of what to try next since I can 
see the memory contents hold the correct CPU ID of the thread that touched the 
memory, so either the syscall is wrong, or the kernel is doing something else. 
I welcome any suggestions on what might be wrong.

Thanks for trying to help.

JB

-Original Message-
From: Brice Goglin  
Sent: 26 January 2019 10:19
To: Biddiscombe, John A. 
Cc: Hardware locality user list 
Subject: Re: [hwloc-users] unusual memory binding results

Le 25/01/2019 à 23:16, Biddiscombe, John A. a écrit :
>> move_pages() returning 0 with -14 in the status array? As opposed to 
>> move_pages() returning -1 with errno set to 14, which would definitely be a 
>> bug in hwloc.
> I think it was move_pages returning zero with -14 in the status array, and 
> then hwloc returning 0 with an empty nodeset (which I then messed up by 
> calling get bitmap first and assuming 0 meant numa node zero and not checking 
> for an empty nodeset).
>
> I'm not sure why I get -EFAULT status rather than -NOENT, but that's what I'm 
> seeing in the status field when I pass the pointer returned from the 
> alloc_membind call.

The only reason I see for getting -EFAULT there would be that you pass the 
buffer to move_pages (what hwloc_get_area_memlocation() wants, a start pointer 
and length) instead of a pointer to an array of page addresses (move_pages 
wants a void** pointing to individual pages).

Brice


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] unusual memory binding results

2019-01-25 Thread Brice Goglin

Le 25/01/2019 à 14:17, Biddiscombe, John A. a écrit :
> Dear List/Brice
>
> I experimented with disabling the memory touch on threads except for 
> N=1,2,3,4 etc and found a problem in hwloc, which is that the function 
> hwloc_get_area_memlocation was returning '0' when the status of the memory 
> null move operation was -14 (#define EFAULT 14 /* Bad address */). This was 
> when I call get area memlocation immediately after allocating and then 'not' 
> touching. I think if the status is an error, then the function should 
> probably return -1, but anyway. I'll file a bug and send a patch if this is 
> considered to be a bug.


Just to be sure, you talking about move_pages() returning 0 with -14 in
the status array? As opposed to move_pages() returning -1 with errno set
to 14, which would definitely be a bug in hwloc.


When the page is valid but not allocated yet, move_pages() is supposed
to return status = -ENOENT. This case is not an error, so returning 0
with an empty nodeset looks fine to me (pages are not allocated, hence
they are allocated on an empty set of nodes).

-EFAULT means that the page is invalid (you'd get a segfault if you
touch it). I am not sure what we should return in that case. It's also
true that pages are allocated nowhere :)

Anyway, if you get -EFAULT in status, it should mean that an invalid
address was passed to hwloc_get_area_memlocation() or an invalid length.

Brice


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] unusual memory binding results

2019-01-25 Thread Biddiscombe, John A.
Dear List/Brice

I experimented with disabling the memory touch on threads except for N=1,2,3,4 
etc and found a problem in hwloc, which is that the function 
hwloc_get_area_memlocation was returning '0' when the status of the memory null 
move operation was -14 (#define EFAULT 14 /* Bad address */). This was when I 
call get area memlocation immediately after allocating and then 'not' touching. 
I think if the status is an error, then the function should probably return -1, 
but anyway. I'll file a bug and send a patch if this is considered to be a bug.

I then modified the test routine to write the value returned from sched_getcpu 
into the touched memory location to verify that the thread binding was doing 
the right thing. The output below from the AMD 8 numanode machine looks good 
with threads 0,8,16 etc each touching memory which follows the pattern expected 
from the 8 numanode test. my get numa domain function however, does not reflect 
the right numanode. It looks correct for the first column (matrices are stored 
in column major order), but after that it falls to pieces. In this test, I'm 
allocating tiles as 512x512 doubles, so 4096 bytes per tile giving one tile 
column per page and I do 512 pages per tile. All the memory locations check out 
and the patters seem fine, but the call to 
// edited version of the one in hwloc source
syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) 
is not returning the numanode that I expect to see from the first touch when it 
is enabled.

Either the syscall is wrong, or the first touch/nexttouch doesn't work (could 
the alloc routine be wrong?)
hwloc_alloc_membind(topo, len, bitmap->get_bmp(),
(hwloc_membind_policy_t)(policy),
flags | HWLOC_MEMBIND_BYNODESET);
where the nodeset should match the numanode mask (I'd will double check that 
right now).

Any ideas on what to try next?

Thanks

JB

get_numa_domain() 8 Domain Numa pattern
00740640
10740640
20740640
30740640
40740640
50740640
60740640
70740640



Contents of memory locations = sched_getcpu()
0 8 16 24 32 40 48 56 
8 16 24 32 40 48 56 0 
16 24 32 40 48 56 0 8 
24 32 40 48 56 0 8 16 
32 40 48 56 0 8 16 24 
40 48 56 0 8 16 24 32 
48 56 0 8 16 24 32 40 
56 0 8 16 24 32 40 48 



Expected 8 Domain Numa pattern
01234567
12345670
23456701
34567012
45670123
56701234
67012345
70123456

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] unusual memory binding results

2019-01-21 Thread Biddiscombe, John A.
Brice

Apologies, I didn't explain it very well, I do make sure that if the tile size 
256*8 < 4096 (pagesize), then I double the number of tiles per page, I just 
wanted to keep the explanation simple. 

here are some code snippets to give you the flavour of it 

initializing the helper sruct

matrix_numa_binder(std::size_t Ncols, std::size_t Nrows,
   std::size_t Ntile, std::size_t Ntiles_per_domain,
   std::size_t Ncolprocs=1, std::size_t Nrowprocs=1,
   std::string pool_name="default"
   )
: cols_(Ncols), rows_(Nrows),
  tile_size_(Ntile), tiles_per_domain_(Ntiles_per_domain),
  colprocs_(Ncolprocs), rowprocs_(Nrowprocs)
{
using namespace hpx::compute::host;
binding_helper::pool_name_ = pool_name;
const int CACHE_LINE_SIZE = sysconf (_SC_LEVEL1_DCACHE_LINESIZE);
const int PAGE_SIZE   = sysconf(_SC_PAGE_SIZE);
const int ALIGNMENT   = std::max(PAGE_SIZE,CACHE_LINE_SIZE);
const int ELEMS_ALIGN = (ALIGNMENT/sizeof(T));
rows_page_= ELEMS_ALIGN;
leading_dim_  = ELEMS_ALIGN*((rows_*sizeof(T) + 
ALIGNMENT-1)/ALIGNMENT);
tiles_per_domain_ = std::max(tiles_per_domain_, ELEMS_ALIGN/tile_size_);
}

operator called by allocator which returns the domain index to bind a page to

virtual std::size_t operator ()(
const T * const base_ptr, const T * const page_ptr,
const std::size_t pagesize, const std::size_t domains) const 
override
{
std::size_t offset  = (page_ptr - base_ptr);
std::size_t col = (offset / leading_dim_);
std::size_t row = (offset % leading_dim_);
std::size_t index   = (col / (tile_size_ * tiles_per_domain_));

if ((tile_size_*tiles_per_domain_*sizeof(T))>=pagesize) {
index += (row / (tile_size_ * tiles_per_domain_));
}
else {
HPX_ASSERT(0);
}
return index % domains;
}

this function is called by each thread (one per numa domain) and if the domain 
returned by the page query matches the domain ID of the thread/task then the 
first memory location on the page is written to

for (size_type i=0; ioperator()(p, page_ptr, pagesize, 
nodesets.size());
if (dom==numa_domain) {
// trigger a memory read and rewrite without changing 
contents
volatile char* vaddr = (volatile char*) page_ptr;
*vaddr = T(0); // *vaddr;
}
page_ptr += pageN;
}

All of this has been debugged quite extensively and I can write numbers to 
memory and read them back and the patterns always match the domains expected.

This function is called after all data is written to attempt to verify (and 
display the patterns above)

int topology::get_numa_domain(const void *addr) const
{
#if HWLOC_API_VERSION >= 0x00010b06
hpx_hwloc_bitmap_wrapper *nodeset = topology::bitmap_storage_.get();
if (nullptr == nodeset)
{
hwloc_bitmap_t nodeset_ = hwloc_bitmap_alloc();
topology::bitmap_storage_.reset(new 
hpx_hwloc_bitmap_wrapper(nodeset_));
nodeset = topology::bitmap_storage_.get();
}
//
hwloc_nodeset_t ns = 
reinterpret_cast(nodeset->get_bmp());

int ret = hwloc_get_area_memlocation(topo, addr, 1,  ns,
HWLOC_MEMBIND_BYNODESET);
if (ret<0) {
std::string msg(strerror(errno));
HPX_THROW_EXCEPTION(kernel_error
  , "hpx::threads::topology::get_numa_domain"
  , "hwloc_get_area_memlocation failed " + msg);
return -1;
}
// this uses hwloc directly
//int bit = hwloc_bitmap_first(ns);
//return bit
// this uses an alternative method, both give the same result AFAICT
threads::mask_type mask = bitmap_to_mask(ns, HWLOC_OBJ_NUMANODE);
return static_cast(threads::find_first(mask));
#else
return 0;
#endif
}

Thanks for taking the time to look it over

JB
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] unusual memory binding results

2019-01-21 Thread Brice Goglin

Le 21/01/2019 à 17:08, Biddiscombe, John A. a écrit :
> Dear list,
>
> I'm allocating a matrix of size (say) 2048*2048 on a node with 2 numa domains 
> and initializing the matrix by using 2 threads, one pinned on each numa 
> domain - with the idea that I can create tiles of memory bound to each numa 
> domain rather than having pages assigned all to one, interleaved, or possibly 
> random. The tiling pattern can be user defined, but I am using a simple 
> strategy that touches pages based on a simple indexing scheme using (say) a 
> tile size of 256 elements and should give a pattern like this


Hello John,

First idea:

A title of 256 element means you're switching between tiles every 2kB
(if elements are double precision), hence half the page belongs to one
thread and the other half to the another thread, hence only the first
one touching his tile will actually allocate locally.

One way to debug would be to disable touching in N-1 thread to check
that everything allocated in on the right node.

Can you share the code, or at least part of it?

Brice


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users