Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Chris Samuel
- "Ralph Castain" wrote: > Perhaps a telecon (myself, Jeff S, and you) would be best at this > stage. Sounds good, will take that part to private email. > I confess I'm now confused too - what you describe is precisely > what we already do. I added printf()'s to the PLPA init(), PLPA_NA

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Ralph Castain
Perhaps a telecon (myself, Jeff S, and you) would be best at this stage. I confess I'm now confused too - what you describe is precisely what we already do. Let me know when you are available and we'll try to work out a time - might as well do that off list so we don't bang everyone's inbox

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Ralph, - "Ralph Castain" wrote: > UmmmI'll let you guys work this out on PLPA. However, just to > clarify, OMPI currently binds to cores, not logical cpus. It is the > PLPA that is "dumb" and provides the plumbing to do what OMPI tells > it. > > :-) Ahh, if that's the case then

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Ralph Castain
UmmmI'll let you guys work this out on PLPA. However, just to clarify, OMPI currently binds to cores, not logical cpus. It is the PLPA that is "dumb" and provides the plumbing to do what OMPI tells it. :-) On Jul 24, 2009, at 2:18 AM, Chris Samuel wrote: - "Jeff Squyres" wrote:

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
- "Jeff Squyres" wrote: > PLPA does not currently deal with cpusets. I think it can get close enough if it assumes that its initial affinity list is the subset of cores that it can choose from when setting CPU affinity. As for whether OMPI or PLPA should choose, I suspect it's better if OM

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Bert, - "Bert Wesarg" wrote: > The Cpus_allowed* fields in /proc//status are the same as > sched_getaffinity returns and the /proc//cpuset needs to be > resolved, i.e. where is the cpuset fs mounted? The convention is to mount it on /dev/cpuset. Unfortunately you cannot mount both the c

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
- "Jeff Squyres" wrote: Hi Jeff, > I'm the "primary PLPA" guy that Ralph referred to, and I was on > vacation last week -- sorry for missing all the chatter. No worries! > Based on your mails, it looks like you're out this week -- so little > will likely occur. I'm at the MPI Forum st

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Ralph Castain
I apologize for coming to this late - IU's email forwarding jammed up yesterday, so I'm only now getting around to reading the backlog. There has been some off-list discussion about advanced paffinity mappings/bindings. As I noted there, I am in the latter stages of completing a new mapper

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Bert Wesarg
On Wed, Jul 22, 2009 at 19:24, Jeff Squyres wrote: > Bert -- is this functionality something we'd want to incorporate into PLPA? What functionality? The complete libcpuset or just the 'get me the cpuset mask of this task'? I don't think its good if we duplicate the whole functionality of the libcpu

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Jeff Squyres
Bert -- is this functionality something we'd want to incorporate into PLPA? On Jul 22, 2009, at 1:13 PM, Bert Wesarg wrote: On Wed, Jul 22, 2009 at 18:55, Bert Wesarg wrote: > I does not know any C interface to get a tasks cpuset mask (ok, > libcpuset Just an amendment to give the url to th

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Jeff Squyres
On Jul 22, 2009, at 11:17 AM, Sylvain Jeaugey wrote: I'm interested in joining the effort, since we will likely have the same problem with SLURM's cpuset support. Ok. > But as to why it's getting EINVAL, that could be wonky. We might want to > take this to the PLPA list and have you run

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Bert Wesarg
On Wed, Jul 22, 2009 at 18:55, Bert Wesarg wrote: > I does not know any C interface to get a tasks cpuset mask (ok, > libcpuset Just an amendment to give the url to the libcpuset homepage: http://oss.sgi.com/projects/cpusets/ > > Bert >> >> Sylvain >> _

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Bert Wesarg
On Wed, Jul 22, 2009 at 17:17, Sylvain Jeaugey wrote: > Hi Jeff, > > I'm interested in joining the effort, since we will likely have the same > problem with SLURM's cpuset support. > > On Wed, 22 Jul 2009, Jeff Squyres wrote: > >> But as to why it's getting EINVAL, that could be wonky.  We might wa

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Sylvain Jeaugey
Hi Jeff, I'm interested in joining the effort, since we will likely have the same problem with SLURM's cpuset support. On Wed, 22 Jul 2009, Jeff Squyres wrote: But as to why it's getting EINVAL, that could be wonky. We might want to take this to the PLPA list and have you run some small, no

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-22 Thread Jeff Squyres
I'm the "primary PLPA" guy that Ralph referred to, and I was on vacation last week -- sorry for missing all the chatter. Based on your mails, it looks like you're out this week -- so little will likely occur. I'm at the MPI Forum standards meeting next week, so my replies to email will be

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > Should just be > > -mca paffinity_base_verbose 5 > > Any value greater than 4 should turn it "on" Yup, that's what I was trying, but couldn't get any output. > Something I should have mentioned. The paffinity_base_service.c file > is solely used by the rank_f

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Ralph Castain
Something I should have mentioned. The paffinity_base_service.c file is solely used by the rank_file syntax. It has nothing to do with setting mpi_paffinity_alone and letting OMPI self-determine the process-to-core binding. You want to dig into the linux module code that calls down into th

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Ralph Castain
Should just be -mca paffinity_base_verbose 5 Any value greater than 4 should turn it "on" On Jul 19, 2009, at 7:54 AM, Chris Samuel wrote: - "Chris Samuel" wrote: I'll carry on digging. I've been trying to track back from the linux paffinity module to find some useful debugging inf

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Chris Samuel" wrote: > I'll carry on digging. I've been trying to track back from the linux paffinity module to find some useful debugging info I can get my teeth into and I can see that the file: opal/mca/paffinity/base/paffinity_base_service.c seems to have lots of useful debugging

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > This will tell you what module is loaded. If PLPA -can- run, you > should see the linux module selected. Thanks Ralph, yes it is being selected. I'll carry on digging. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partn

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-18 Thread Ralph Castain
No, that should be fine - if you are on a Linux system, the PLPA would automatically be selected. That said, it is possible it would reject support if the kernel isn't compatible. You can check what it does by setting: -mca paffinity_base_verbose 10 This will tell you what module is loaded

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-18 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? If not, we won't set affinity on the > processes. Just realised that in the failed test I posted I set -mca mpi_affinity_alone 1 *instead* of -mca paffinity linux, rather than as

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Sounds like a problem in PLPA - I'll have to defer > to them. Understood, thanks for that update. I'll try and find some time to look inside PLPA too. > Our primary PLPA person is on vacation this week, so > you might not hear back from him until later next wee

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Terry Dontje
There are some mailing lists for PLPA at: http://www.open-mpi.org/community/lists/plpa.php --td Ralph Castain wrote: Sounds like a problem in PLPA - I'll have to defer to them. Our primary PLPA person is on vacation this week, so you might not hear back from him until later next week when he g

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Ralph Castain
Sounds like a problem in PLPA - I'll have to defer to them. Our primary PLPA person is on vacation this week, so you might not hear back from him until later next week when he gets through his inbox mountain. PLPA may have its own mailing list too - not really sure. On Jul 15, 2009, at 10:

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? Ahh, no, sorry, still feeling my way with this.. > If not, we won't set affinity on the processes. Now it fails immediately with: Setting processor affinity failed --> Ret

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Looking at your command line, did you remember to set -mca mpi_paffinity_alone 1? If not, we won't set affinity on the processes. On Jul 15, 2009, at 8:11 PM, Chris Samuel wrote: - "Ralph Castain" wrote: Could you check this? You can run a trivial job using the -npernode x option, wh

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: > Could you check this? You can run a trivial job using the -npernode x > option, where x matched the #cores you were allocated on the nodes. > If you do this, do we bind to the correct cores? Nope, I'm afraid it doesn't - submitted a job asking for 4 cores on one

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Hmmm...I believe I made a mis-statement. Shocking to those who know me, I am sure! :-) Just to correct my comments: OMPI knows how many "slots" have been allocated to us, but not which "cores". So I'll assign the correct number of procs to each node, but they won't know that we were allocated core

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: Hi Ralph, > Interesting. No, we don't take PLPA cpu sets into account when > retrieving the allocation. Understood. > Just to be clear: from an OMPI perspective, I don't think this is an > issue of binding, but rather an issue of allocation. If we knew we had >

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Interesting. No, we don't take PLPA cpu sets into account when retrieving the allocation. Just to be clear: from an OMPI perspective, I don't think this is an issue of binding, but rather an issue of allocation. If we knew we had been allocated only a certain number of cores on a node, then

[OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
Hi all, Not sure if this is a OpenMPI query or a PLPA query, but given that PLPA seems to have some support for it already I thought I'd start here. :-) We run a quad core Opteron cluster with Torque 2.3.x which uses the kernels cpuset support to constrain a job to just the cores it has been allo