On Jun 3, 2010, at 8:54 AM, guillaume ranquet wrote:

> granquet@bordeplage-15 ~ $ mpirun --mca btl mx,openib,sm,self --mca pml
> ^cm --mca mpi_leave_pinned 0 ~/bwlat/mpi_helloworld
> [bordeplage-15.bordeaux.grid5000.fr:02707] Error in mx_init (error No MX
> device entry in /dev.)
> Hello world from process 0 of 1
> 
> it works :)

Jeff, you may want to change this message to opal_output_verbose(). It is in 
$OMPI/ompi/mca/common/common_mx.c.

>> Ok. I think that OMPI is trying to open the MX MTL first. It fails at
>> mx_init() (the first error message) but it had already created some
>> mpool resources. It then tries to open the MX BTL and it skips the MX
>> initialization and returns SUCCESS. The MX BTL then tries to call
>> mx_get_info() which fails and prints the second message.
>> 
>> Try the attached patch. It tries to clean up if mx_init() fails and
>> does not return SUCCESS on subsequent attempts to initialize MX.
>> 
>> Scott
> 
> I tried your patch and it seems to correct the issue:
> 
> configured with:  --prefix=$HOME/openmpi-1.4.2-nomx-bin/
> - --with-openib=/usr --with-mx=/usr
> 
> $ ~/openmpi-1.4.2-nomx-bin/bin/mpirun ~/bwlat/mpi_helloworld
> [bordeplage-15.bordeaux.grid5000.fr:22406] Error in mx_init (error No MX
> device entry in /dev.)
> Hello world from process 0 of 1

Excellent.

> don't hesitate if you need further testing :)

Thanks for all your assistance!

> do you plan on applying this patch on next release? (1.4.3?)

Jeff, I leave this up to you and George.

Scott

Reply via email to