On Jul 9, 2010, at 3:23 PM, Jerome Soumagne wrote:
> Hi Ken,
>
> I thank you a lot for your reply, I will think about it and do some more
> tests. I was only thinking about using MPI threads, but yes as you say if two
> threads are scheduled on the same core, that wouldn't be pretty at all. I
Hi Ken,
I thank you a lot for your reply, I will think about it and do some more
tests. I was only thinking about using MPI threads, but yes as you say
if two threads are scheduled on the same core, that wouldn't be pretty
at all. I can probably do some more tests of that functionality, but I
Hello Jerome,
The first one is simple. portals is not thead-safe on the Cray XT. As, I
recall,
only the master thread can post an event. although any thread can receive
the event. Although, i might have it backwards. It has been a couple of years
since I played with this.
The second one depe
I would prefer the first patch though so that we get rid of scripts and
of another env variable but well, I let you choose.
Jerome
On 07/09/2010 06:27 PM, Jerome Soumagne wrote:
Hi Ken,
That's interesting, setting the OMPI_ALPS_RESID in the modules so that
it executes the ras-alps-command.sh
Hi Ken,
That's interesting, setting the OMPI_ALPS_RESID in the modules so that
it executes the ras-alps-command.sh is a good idea. In this case another
way would be to add an extra line in this script with the
BASIL_RESERVATION_ID as you did for the BATCH_PARTITION_ID.
I have another possible
Ralph,
His patch only modifies the ALPS RAS mca. And, it causes the environmental
variable BASIL_RESERVATION_ID to be a synonym for OMPI_ALPS_RESID.
It makes it convenient for the version of SLURM that they are proposing. But,
it does not invoke any side-effects.
--
Ken Matney, Sr.
Oak Ridge Nat
Actually, this patch doesn't have anything to do with slurm according to the
documentation in the links. It has to do with Cray's batch allocator system,
which slurm is just interfacing to. So what you are really saying is that you
want the alps ras to run if we either detect the presence of alp
another link which can be worth mentioning:
https://computing.llnl.gov/linux/slurm/cray.html
it says at the top of the page *NOTE: As of January 2009, the SLURM
interface to Cray systems is incomplete.
*but what we have now on our system is something which is reasonably
stable and a good part
Hi Jerome,
I am in part responsible for the current incarnation of the ALPS support in
OMPI. We use the
modules environment to set OMPI_ALPS_RESID to the ALPS reservation ID, the
pertinent
parts of which are:
set ridpath ${basedir}/share/openmpi
set
It's not invented, it's a SLURM standard name. Sorry for not having said
that, my first e-mail was really too short.
http://manpages.ubuntu.com/manpages/lucid/man1/sbatch.1.html
http://slurm-llnl.sourcearchive.com/documentation/2.1.1/basil__interface_8c-source.html
...
google could have been you
Appreciate your explanation, but it doesn't align with your patch. Your patch
doesn't do anything because it patches the slurm ras module, but the system is
selecting the alps ras module - so your patch never runs.
What am I missing?
On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote:
> Ok I ma
To clarify: what I'm trying to understand is what the heck a
"BASIL_RESERVATION_ID" is - it isn't a standard slurm thing, nor can I find it
defined in alps, so it appears to just be a local name you invented. True?
If so, I would rather see some standard name instead of something local to one
o
My bad - I see that you actually do patch the alps ras. Is BASIL_RESERVATION_ID
something included in alps, or is this just a name you invented?
On Jul 9, 2010, at 8:08 AM, Jerome Soumagne wrote:
> Ok I may have not explained very clearly. In our case we only use SLURM for
> the resource manag
Ok I may have not explained very clearly. In our case we only use SLURM
for the resource manager.
The difference here is that the SLURM version that we use has support
for ALPS. Therefore when we run our job using the mpirun command, since
we have the alps environment loaded, it's the ALPS RAS w
Afraid I'm now even more confused. You use SLURM to do the allocation, and then
use ALPS to launch the job?
I'm just trying to understand because I'm the person who generally maintains
this code area. We have two frameworks involved here:
1. RAS - determines what nodes were allocated to us. The
Well we actually use a patched version of SLURM, 2.2.0-pre8. It is
planned to submit the modifications made internally at CSCS for the next
SLURM release in November. We implement ALPS support based on the basic
architecture of SLURM.
SLURM is only used to do the ALPS ressource allocation. We th
Forgive my confusion, but could you please clarify something? You are using
ALPS as the resource manager doing the allocation, and then using SLURM as the
launcher (instead of ALPS)?
That's a combination we've never seen or heard about. I suspect our module
selection logic would be confused by
Hi,
As I said in the previous e-mail, we've recently installed OpenMPI on a
Cray XT5 machine, and we therefore use the portals and the alps
libraries. Thanks for providing the configuration script from Jaguar,
this was very helpful, it had just to be slightly adapted in order to
use the lates
Hi,
We've recently installed OpenMPI on one of our Cray XT5 machines, here
at CSCS. This machine uses SLURM for launching jobs.
Doing an salloc defines this environment variable:
BASIL_RESERVATION_ID
The reservation ID on Cray systems running ALPS/BASIL only.
Since
19 matches
Mail list logo