Hi,

just a dumb question but did you actually built Slurm's PMI plugin? As it is 
considered additional you have to manually compile
and install it…

Regards,

        Uwe



Am 17.06.2015 um 18:52 schrieb Wiegand, Paul:
> Rémi,
> 
> This got me a bit farther, thanks.
> 
>> The stack trace stuck in BTL openib makes me think it's more related to Open 
>> MPI <-> IB integration than to Slurm <-> Open MPI.
> 
> I agree that it seems like an MPI/IB thing; however, I can run using 
> Torque/Moab via SSH so there's some kind of difference here that I'm not 
> understanding, I think.
> 
> 
>> Did you check the permissions of your IB devices in /dev?
> 
> Good point.  I believe these are correct.  We're not having a problem with 
> any other IB-based applications, including other MPI/IB models.  But I 
> checked, and they look right to me.
> 
> 
>> It could work w/o problem using `mpirun -host` because of MCA related 
>> environment variables may be set in your module and not propagated by mpirun 
>> through SSH where Slurm basically propagate everything.
> 
> I did the following command, both in my normal shell as well as after getting 
> a shell from salloc, then diffed the results (no difference).  What else can 
> I check?
> 
> ompi-info --all | grep -i btl   
> 
> 
> 
>> You can also check it is related to IB by disabling it explicitely in Open 
>> MPI BTL framework in parameters of mpirun.
> 
> This was a good idea.  I ran correctly with the following in an salloc shell, 
> which confirms that it's happening at the IB integration level:
> 
> mpirun --mca btl ^openib ./simple
> 
> 
> 
> So the question is:  Why aren't the MCA parameters propagating?  Or:  What 
> did I misconfigure so they would not.  Torque uses ssh when it deploys, and 
> we've no problems with any of our MPI setups via Torque.  Is there some 
> Slurm-ishness I my Torquey assumptions are getting in the way of me 
> understanding?
> 
> Thanks,
> Paul.
> 
> 
> P.S.,  To Andy Reibs:  Thanks for your suggestion.  My current build does use 
> PMI and explicitly paths to the Slurm PMI.  I tried your /etc/sysconfig/slurm 
> suggestion, but no dice.
> 
> 

Reply via email to