Re: [OMPI devel] alps ras patch for SLURM

2010-07-12 Thread Jerome Soumagne
Thanks for accepting this patch. Jerome On 07/12/2010 03:47 PM, Ralph Castain wrote: Thanks for the explanation, Ken! I'll take care of the patch. On Jul 12, 2010, at 6:40 AM, Matney Sr, Kenneth D. wrote: Hi Ralph, I think that it would be overstating the case to say that I am re-assum

Re: [OMPI devel] alps ras patch for SLURM

2010-07-12 Thread Ralph Castain
Thanks for the explanation, Ken! I'll take care of the patch. On Jul 12, 2010, at 6:40 AM, Matney Sr, Kenneth D. wrote: > Hi Ralph, > > I think that it would be overstating the case to say that I am re-assuming > those duties. Rather, I am trying to fill the gap in a minimal sense while > we l

Re: [OMPI devel] alps ras patch for SLURM

2010-07-12 Thread Matney Sr, Kenneth D.
Hi Ralph, I think that it would be overstating the case to say that I am re-assuming those duties. Rather, I am trying to fill the gap in a minimal sense while we locate a replacement for Rainer. I expect to help our replacement get up to speed on portals and ALPS; but, I have too many other dut

Re: [OMPI devel] alps ras patch for SLURM

2010-07-10 Thread Ralph Castain
Sounds good then. I only got into this thread because (a) the reference to slurm, and (b) with Rainer's departure, I wasn't sure if someone else was going to pickup the alps support. Since you are re-assuming those latter duties (yes?), and since this actually has nothing to do with slurm itsel

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
I would prefer the first patch though so that we get rid of scripts and of another env variable but well, I let you choose. Jerome On 07/09/2010 06:27 PM, Jerome Soumagne wrote: Hi Ken, That's interesting, setting the OMPI_ALPS_RESID in the modules so that it executes the ras-alps-command.sh

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Hi Ken, That's interesting, setting the OMPI_ALPS_RESID in the modules so that it executes the ras-alps-command.sh is a good idea. In this case another way would be to add an extra line in this script with the BASIL_RESERVATION_ID as you did for the BATCH_PARTITION_ID. I have another possible

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Matney Sr, Kenneth D.
Ralph, His patch only modifies the ALPS RAS mca. And, it causes the environmental variable BASIL_RESERVATION_ID to be a synonym for OMPI_ALPS_RESID. It makes it convenient for the version of SLURM that they are proposing. But, it does not invoke any side-effects. -- Ken Matney, Sr. Oak Ridge Nat

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Actually, this patch doesn't have anything to do with slurm according to the documentation in the links. It has to do with Cray's batch allocator system, which slurm is just interfacing to. So what you are really saying is that you want the alps ras to run if we either detect the presence of alp

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
another link which can be worth mentioning: https://computing.llnl.gov/linux/slurm/cray.html it says at the top of the page *NOTE: As of January 2009, the SLURM interface to Cray systems is incomplete. *but what we have now on our system is something which is reasonably stable and a good part

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Matney Sr, Kenneth D.
Hi Jerome, I am in part responsible for the current incarnation of the ALPS support in OMPI. We use the modules environment to set OMPI_ALPS_RESID to the ALPS reservation ID, the pertinent parts of which are: set ridpath ${basedir}/share/openmpi set

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
It's not invented, it's a SLURM standard name. Sorry for not having said that, my first e-mail was really too short. http://manpages.ubuntu.com/manpages/lucid/man1/sbatch.1.html http://slurm-llnl.sourcearchive.com/documentation/2.1.1/basil__interface_8c-source.html ... google could have been you

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Appreciate your explanation, but it doesn't align with your patch. Your patch doesn't do anything because it patches the slurm ras module, but the system is selecting the alps ras module - so your patch never runs. What am I missing? On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote: > Ok I ma

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
To clarify: what I'm trying to understand is what the heck a "BASIL_RESERVATION_ID" is - it isn't a standard slurm thing, nor can I find it defined in alps, so it appears to just be a local name you invented. True? If so, I would rather see some standard name instead of something local to one o

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
My bad - I see that you actually do patch the alps ras. Is BASIL_RESERVATION_ID something included in alps, or is this just a name you invented? On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote: > Ok I may have not explained very clearly. In our case we only use SLURM for > the resource manag

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Ok I may have not explained very clearly. In our case we only use SLURM for the resource manager. The difference here is that the SLURM version that we use has support for ALPS. Therefore when we run our job using the mpirun command, since we have the alps environment loaded, it's the ALPS RAS w

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Afraid I'm now even more confused. You use SLURM to do the allocation, and then use ALPS to launch the job? I'm just trying to understand because I'm the person who generally maintains this code area. We have two frameworks involved here: 1. RAS - determines what nodes were allocated to us. The

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Well we actually use a patched version of SLURM, 2.2.0-pre8. It is planned to submit the modifications made internally at CSCS for the next SLURM release in November. We implement ALPS support based on the basic architecture of SLURM. SLURM is only used to do the ALPS ressource allocation. We th

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Forgive my confusion, but could you please clarify something? You are using ALPS as the resource manager doing the allocation, and then using SLURM as the launcher (instead of ALPS)? That's a combination we've never seen or heard about. I suspect our module selection logic would be confused by

[OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Hi, We've recently installed OpenMPI on one of our Cray XT5 machines, here at CSCS. This machine uses SLURM for launching jobs. Doing an salloc defines this environment variable: BASIL_RESERVATION_ID The reservation ID on Cray systems running ALPS/BASIL only. Since