If you are launching via mpirun, then you won't be using either version of PMI 
- OMPI has its own internal daemons that handle the launch and wireup.

It's odd that it happens across OMPI versions as there exist significant 
differences between them. Is the speed difference associated with non-MPI jobs 
as well? In other words, if you execute "mpirun hostname", does it also take an 
inordinate amount of time?

If not, then the other possibility is that you are falling back on TCP instead 
of IB, or that something is preventing the use of shared memory as a transport 
for procs on the same node.


> On Feb 5, 2015, at 5:02 PM, Peter A Ruprecht <peter.rupre...@colorado.edu> 
> wrote:
> 
> 
> Answering two questions at one time:
> 
> I am pretty sure we are not using PMI2.
> 
> Jobs are launched via "sbatch job_script" where the script contains
> "mpirun ./executable_file".  There appear to be issues with at least OMPI
> 1.6.4 and 1.8.X.
> 
> Thanks
> Peter
> 
> On 2/5/15, 5:39 PM, "Ralph Castain" <r...@open-mpi.org> wrote:
> 
>> 
>> And are you launching via mpirun or directly with srun <myapp>? What OMPI
>> version are you using?
>> 
>> 
>>> On Feb 5, 2015, at 3:32 PM, Chris Samuel <sam...@unimelb.edu.au> wrote:
>>> 
>>> 
>>> On Thu, 5 Feb 2015 03:27:25 PM Peter A Ruprecht wrote:
>>> 
>>>> I ask because some of our users have started reporting a 10x increase
>>>> in
>>>> run-times of OpenMPI jobs since we upgraded to 14.11.3 from 14.3.  It's
>>>> possible there is some other problem going on in our cluster, but all
>>>> of
>>>> our hardware checks including Infiniband diagnostics look pretty clean.
>>> 
>>> Are you using PMI2?
>>> 
>>> cheers,
>>> Chris
>>> -- 
>>> Christopher Samuel        Senior Systems Administrator
>>> VLSCI - Victorian Life Sciences Computation Initiative
>>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>> http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to