Re: [OMPI users] new core binding issues?

2018-06-22 Thread Noam Bernstein
> On Jun 22, 2018, at 2:14 PM, Brice Goglin wrote: > > If psr is the processor where the task is actually running, I guess we'd need > your lstopo output to find out where those processors are in the machine. > Excellent, that’s exactly the sort of thing I was hoping someone on the list

Re: [OMPI users] new core binding issues?

2018-06-22 Thread Brice Goglin
If psr is the processor where the task is actually running, I guess we'd need your lstopo output to find out where those processors are in the machine. Brice Le 22 juin 2018 19:13:42 GMT+02:00, Noam Bernstein a écrit : >> On Jun 22, 2018, at 1:00 PM, r...@open-mpi.org wrote: >> >> I

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
Afraid I’m not familiar with that option, so I really don’t know :-( > On Jun 22, 2018, at 10:13 AM, Noam Bernstein > wrote: > >> On Jun 22, 2018, at 1:00 PM, r...@open-mpi.org >> wrote: >> >> I suspect it is okay. Keep in mind that OMPI itself is starting multiple

Re: [OMPI users] new core binding issues?

2018-06-22 Thread Noam Bernstein
> On Jun 22, 2018, at 1:00 PM, r...@open-mpi.org wrote: > > I suspect it is okay. Keep in mind that OMPI itself is starting multiple > progress threads, so that is likely what you are seeing. The binding patter > in the mpirun output looks correct as the default would be to map-by socket > and

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
I suspect it is okay. Keep in mind that OMPI itself is starting multiple progress threads, so that is likely what you are seeing. The binding patter in the mpirun output looks correct as the default would be to map-by socket and you asked that we bind-to core. > On Jun 22, 2018, at 9:33 AM,

[OMPI users] new core binding issues?

2018-06-22 Thread Noam Bernstein
Hi - for the last couple of weeks, more or less since we did some kernel updates, certain compute intensive MPI jobs have been behaving oddly as far as their speed - bits that should be quite fast sometimes (but not consistently) take a long time, and re-running sometimes fixes the issue,