Appears fixed with r17992 - at least, it works on TM, slurm (odin), and Mac.
On 3/27/08 11:06 AM, "Ralph H Castain" wrote:
> Found the problem - should have a fix committed soon. Issue is with
> differences in the number of daemons launched by the various plms (whether
> or not procs are launch
Found the problem - should have a fix committed soon. Issue is with
differences in the number of daemons launched by the various plms (whether
or not procs are launched local to mpirun).
On 3/27/08 10:39 AM, "Ralph H Castain" wrote:
> Hmmm...puzzling. It is working fine for me on TM machines a
Hmmm...puzzling. It is working fine for me on TM machines and on my Mac.
However, Galen reports it borked on alps as well.
I'll have to dig a little to check this out and see if there is something
missing on those PLMs. Will get back shortly.
Sorry for problem
On 3/27/08 10:28 AM, "Tim Prins"
Unfortunately now with r17988 I cannot run any mpi programs, they seem
to hang in the modex.
Tim
Ralph H Castain wrote:
Thanks Tim - I found the problem and will commit a fix shortly.
Appreciate your testing and reporting!
On 3/27/08 8:24 AM, "Tim Prins" wrote:
This commit breaks things
Thanks Tim - I found the problem and will commit a fix shortly.
Appreciate your testing and reporting!
On 3/27/08 8:24 AM, "Tim Prins" wrote:
> This commit breaks things for me. Running on 3 nodes of odin:
>
> mpirun -mca btl tcp,sm,self examples/ring_c
>
> causes a hang. All of the process
This commit breaks things for me. Running on 3 nodes of odin:
mpirun -mca btl tcp,sm,self examples/ring_c
causes a hang. All of the processes are stuck in
orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
and the ring program does not hang all the time, but fairly often.