[OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Hi, We've recently installed OpenMPI on one of our Cray XT5 machines, here at CSCS. This machine uses SLURM for launching jobs. Doing an salloc defines this environment variable: BASIL_RESERVATION_ID The reservation ID on Cray systems running ALPS/BASIL only. Since

[OMPI devel] some questions regarding the portals modules

2010-07-09 Thread Jerome Soumagne
Hi, As I said in the previous e-mail, we've recently installed OpenMPI on a Cray XT5 machine, and we therefore use the portals and the alps libraries. Thanks for providing the configuration script from Jaguar, this was very helpful, it had just to be slightly adapted in order to use the lates

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Forgive my confusion, but could you please clarify something? You are using ALPS as the resource manager doing the allocation, and then using SLURM as the launcher (instead of ALPS)? That's a combination we've never seen or heard about. I suspect our module selection logic would be confused by

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Well we actually use a patched version of SLURM, 2.2.0-pre8. It is planned to submit the modifications made internally at CSCS for the next SLURM release in November. We implement ALPS support based on the basic architecture of SLURM. SLURM is only used to do the ALPS ressource allocation. We th

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Afraid I'm now even more confused. You use SLURM to do the allocation, and then use ALPS to launch the job? I'm just trying to understand because I'm the person who generally maintains this code area. We have two frameworks involved here: 1. RAS - determines what nodes were allocated to us. The

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Ok I may have not explained very clearly. In our case we only use SLURM for the resource manager. The difference here is that the SLURM version that we use has support for ALPS. Therefore when we run our job using the mpirun command, since we have the alps environment loaded, it's the ALPS RAS w

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
My bad - I see that you actually do patch the alps ras. Is BASIL_RESERVATION_ID something included in alps, or is this just a name you invented? On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote: > Ok I may have not explained very clearly. In our case we only use SLURM for > the resource manag

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
To clarify: what I'm trying to understand is what the heck a "BASIL_RESERVATION_ID" is - it isn't a standard slurm thing, nor can I find it defined in alps, so it appears to just be a local name you invented. True? If so, I would rather see some standard name instead of something local to one o

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Appreciate your explanation, but it doesn't align with your patch. Your patch doesn't do anything because it patches the slurm ras module, but the system is selecting the alps ras module - so your patch never runs. What am I missing? On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote: > Ok I ma

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
It's not invented, it's a SLURM standard name. Sorry for not having said that, my first e-mail was really too short. http://manpages.ubuntu.com/manpages/lucid/man1/sbatch.1.html http://slurm-llnl.sourcearchive.com/documentation/2.1.1/basil__interface_8c-source.html ... google could have been you

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Matney Sr, Kenneth D.
Hi Jerome, I am in part responsible for the current incarnation of the ALPS support in OMPI. We use the modules environment to set OMPI_ALPS_RESID to the ALPS reservation ID, the pertinent parts of which are: set ridpath ${basedir}/share/openmpi set

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
another link which can be worth mentioning: https://computing.llnl.gov/linux/slurm/cray.html it says at the top of the page *NOTE: As of January 2009, the SLURM interface to Cray systems is incomplete. *but what we have now on our system is something which is reasonably stable and a good part

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Ralph Castain
Actually, this patch doesn't have anything to do with slurm according to the documentation in the links. It has to do with Cray's batch allocator system, which slurm is just interfacing to. So what you are really saying is that you want the alps ras to run if we either detect the presence of alp

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Matney Sr, Kenneth D.
Ralph, His patch only modifies the ALPS RAS mca. And, it causes the environmental variable BASIL_RESERVATION_ID to be a synonym for OMPI_ALPS_RESID. It makes it convenient for the version of SLURM that they are proposing. But, it does not invoke any side-effects. -- Ken Matney, Sr. Oak Ridge Nat

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
Hi Ken, That's interesting, setting the OMPI_ALPS_RESID in the modules so that it executes the ras-alps-command.sh is a good idea. In this case another way would be to add an extra line in this script with the BASIL_RESERVATION_ID as you did for the BATCH_PARTITION_ID. I have another possible

Re: [OMPI devel] alps ras patch for SLURM

2010-07-09 Thread Jerome Soumagne
I would prefer the first patch though so that we get rid of scripts and of another env variable but well, I let you choose. Jerome On 07/09/2010 06:27 PM, Jerome Soumagne wrote: Hi Ken, That's interesting, setting the OMPI_ALPS_RESID in the modules so that it executes the ras-alps-command.sh

Re: [OMPI devel] some questions regarding the portals modules

2010-07-09 Thread Matney Sr, Kenneth D.
Hello Jerome, The first one is simple. portals is not thead-safe on the Cray XT. As, I recall, only the master thread can post an event. although any thread can receive the event. Although, i might have it backwards. It has been a couple of years since I played with this. The second one depe

Re: [OMPI devel] some questions regarding the portals modules

2010-07-09 Thread Jerome Soumagne
Hi Ken, I thank you a lot for your reply, I will think about it and do some more tests. I was only thinking about using MPI threads, but yes as you say if two threads are scheduled on the same core, that wouldn't be pretty at all. I can probably do some more tests of that functionality, but I

Re: [OMPI devel] some questions regarding the portals modules

2010-07-09 Thread Ralph Castain
On Jul 9, 2010, at 3:23 PM, Jerome Soumagne wrote: > Hi Ken, > > I thank you a lot for your reply, I will think about it and do some more > tests. I was only thinking about using MPI threads, but yes as you say if two > threads are scheduled on the same core, that wouldn't be pretty at all. I