Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-10 Thread Jeff Squyres
Tom and I talked more about this off list, and I eventually logged in to his cluster to see what I could see. The issue turned out to be not related to SGE or THREAD_MULTIPLE at all. The issue was that RHEL6, by default, activated a virtualization IP interface on all of Tom's nodes. All

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-09 Thread Reuti
Am 08.02.2012 um 22:52 schrieb Tom Bryan: > > Yes, this should work across multiple machines. And it's using `qrsh -inherit ...` so it's failing somewhere in Open MPI - is it working with 1.4.4? >>> >>> I'm not sure. We no longer have our 1.4 test environment, so I'm in the >>>

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-08 Thread Tom Bryan
On 2/8/12 4:52 PM, "Tom Bryan" wrote: > Got it. Unfortunately, we *definitely* need THREAD_MULTIPLE in our case. > I rebuilt my code against 1.4.4. > > When I run my test "e" from before, which is basically just > mpiexec -np 1 ./mpitest > I get the following [errors]

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-08 Thread Tom Bryan
On 2/6/12 5:10 PM, "Reuti" wrote: > Am 06.02.2012 um 22:28 schrieb Tom Bryan: > >> On 2/6/12 8:14 AM, "Reuti" wrote: >> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support, it's not clear to me whether

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-06 Thread Reuti
Am 06.02.2012 um 22:28 schrieb Tom Bryan: > On 2/6/12 8:14 AM, "Reuti" wrote: > >>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support, >>> it's not clear to me whether MPI::Init_Thread() and >>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-06 Thread Tom Bryan
On 2/6/12 8:14 AM, "Reuti" wrote: >> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support, >> it's not clear to me whether MPI::Init_Thread() and >> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from >> Open MPI. > > If

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-05 Thread Ralph Castain
On Feb 5, 2012, at 6:51 AM, Reuti wrote: > Hi, > >>> Not sure whether I get it right. When I launch the same application with: >>> >>> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines): >>> >>> 27422 ?Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd >>> 9504 ?

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-05 Thread Reuti
Hi, >> Not sure whether I get it right. When I launch the same application with: >> >> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines): >> >> 27422 ?Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd >> 9504 ?S 0:00 \_ sge_shepherd-3791 -bg >> 9506 ?

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-03 Thread Reuti
Am 04.02.2012 um 00:15 schrieb Tom Bryan: A more detailed answer later, as it's late here. But one short note: -pe orte 5 => give me exactly 5 slots -pe orte 5-5 => the same -pe orte 5- => give me at least 5 slots, up to the maximum you can get right now in the cluster The output in `qstat

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-03 Thread Tom Bryan
OK. Sorry for the delay. I needed to read through this thread a few times and try some experiments. Let me reply to a few of these pieces, and then I'll talk about those experiments. On 1/31/12 9:26 AM, "Reuti" wrote: >>> I never used spawn_mutiple, but isn't it

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Ralph Castain
FWIW: I have fixed this on the developer's trunk, and Jeff has scheduled it for release in the upcoming 1.6 release (when 1.5 series rolls over). I don't expect we'll backport it to 1.4 unless someone really needs it there. Thanks! Ralph On Feb 1, 2012, at 9:31 AM, Ralph Castain wrote: > Ah -

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Ralph Castain
Ah - crud. Looks like the default-hostfile mca param isn't getting set to the default value. Will resolve - thanks! On Feb 1, 2012, at 9:28 AM, Reuti wrote: > Am 01.02.2012 um 17:16 schrieb Ralph Castain: > >> Could you add --display-allocation to your cmd line? This will tell us if it >>

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Reuti
Am 01.02.2012 um 17:16 schrieb Ralph Castain: > Could you add --display-allocation to your cmd line? This will tell us if it > found/read the default hostfile, or if the problem is with the mapper. Sure: reuti@pc15370:~> mpiexec --display-allocation -np 4 ./mpihello ==

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Ralph Castain
Could you add --display-allocation to your cmd line? This will tell us if it found/read the default hostfile, or if the problem is with the mapper. On Feb 1, 2012, at 7:58 AM, Reuti wrote: > Am 01.02.2012 um 15:38 schrieb Ralph Castain: > >> On Feb 1, 2012, at 3:49 AM, Reuti wrote: >> >>> Am

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Reuti
Am 01.02.2012 um 15:38 schrieb Ralph Castain: > On Feb 1, 2012, at 3:49 AM, Reuti wrote: > >> Am 31.01.2012 um 21:25 schrieb Ralph Castain: >> >>> On Jan 31, 2012, at 12:58 PM, Reuti wrote: >> >> BTW: is there any default for a hostfile for Open MPI - I mean any in my >> home directory or

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Ralph Castain
On Feb 1, 2012, at 3:49 AM, Reuti wrote: > Am 31.01.2012 um 21:25 schrieb Ralph Castain: > >> >> On Jan 31, 2012, at 12:58 PM, Reuti wrote: > > BTW: is there any default for a hostfile for Open MPI - I mean any in my home > directory or /etc? When I check `man orte_hosts`, and all possible

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-01 Thread Reuti
Am 31.01.2012 um 21:25 schrieb Ralph Castain: > > On Jan 31, 2012, at 12:58 PM, Reuti wrote: > >> >> Am 31.01.2012 um 20:38 schrieb Ralph Castain: >> >>> Not sure I fully grok this thread, but will try to provide an answer. >>> >>> When you start a singleton, it spawns off a daemon that is

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 20:38 schrieb Ralph Castain: > Not sure I fully grok this thread, but will try to provide an answer. > > When you start a singleton, it spawns off a daemon that is the equivalent of > "mpirun". This daemon is created for the express purpose of allowing the > singleton to use

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Ralph Castain
Not sure I fully grok this thread, but will try to provide an answer. When you start a singleton, it spawns off a daemon that is the equivalent of "mpirun". This daemon is created for the express purpose of allowing the singleton to use MPI dynamics like comm_spawn - without it, the singleton

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 20:12 schrieb Jeff Squyres: > I only noticed after the fact that Tom is also here at Cisco (it's a big > company, after all :-) ). > > I've contacted him using our proprietary super-secret Cisco handshake (i.e., > the internal phone network); I'll see if I can figure out the

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Jeff Squyres
I only noticed after the fact that Tom is also here at Cisco (it's a big company, after all :-) ). I've contacted him using our proprietary super-secret Cisco handshake (i.e., the internal phone network); I'll see if I can figure out the issues off-list. On Jan 31, 2012, at 1:08 PM, Dave Love

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Dave Love
Reuti writes: > Maybe it's a side effect of a tight integration that it would start on > the correct nodes (but I face an incorrect allocation of slots and an > error message at the end if started without mpiexec), as in this case > it has no command line option for

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 05:33 schrieb Tom Bryan: >> Suppose you want to start 4 additional tasks, you would need 5 in total from >> SGE. > > OK, thanks. I'll try other values. BTW: there is a setting in the PE definition to allow one addititonal task: $ qconf -sp openmpi ... job_is_first_task FALSE

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Reuti
Am 31.01.2012 um 06:33 schrieb Rayson Ho: > On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote: >> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and >> lots of jobs in quick succession. We're using MPI as an robust way to get >> IPC as we spawn

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Rayson Ho
On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote: > For our use, yes, spawn_multiple makes sense.  We won't be spawning lots and > lots of jobs in quick succession.  We're using MPI as an robust way to get > IPC as we spawn multiple child processes while using SGE to help us

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-30 Thread Tom Bryan
On 1/29/12 5:44 PM, "Reuti" wrote: > you compiled Open MPI --with-sge I assume, as the above is working - fine. Yes, we compiled --with-sge. >> #$ -pe orte 1- > > This number should match the processes you want to start plus one the master. > Otherwise SGE might

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-29 Thread Reuti
Am 27.01.2012 um 23:19 schrieb Tom Bryan: > I am in the process of setting up a grid engine (SGE) cluster for running > Open MPI applications. I'll detail the set up below, but my current problem > is that this call to Span_multiple never seems to return. > > // Spawn all of the children

[OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-27 Thread Tom Bryan
I am in the process of setting up a grid engine (SGE) cluster for running Open MPI applications. I'll detail the set up below, but my current problem is that this call to Span_multiple never seems to return. // Spawn all of the children processes. _intercomm = MPI::COMM_WORLD.Spawn_multiple(