Greetings!

The following batch script will successfully demo the use of LSF's task 
geometry feature using IBM Parallel Environment:
#BUB -J "task_geometry"
#BSUB -n 9
#BSUB -R "span[ptile=3]"
#BSUB -network "type=sn_single:mode=us"
#BSUB -R "affinity[core]"
#BSUB -e "task_geometry.stderr.%J"
#BSUB -o "task_geometry.stdout.%J"
#BSUB -q "normal"
#BSUB -M "800"
#BSUB -R "rusage[mem=800]"
#BSUB -x

export LSB_PJL_TASK_GEOMETRY="{(5)(4,3)(2,1,0)}"

ldd /gpfs/gpfs_stage1/parpia/PE_tests/reporter/bin/reporter_MPI

/gpfs/gpfs_stage1/parpia/PE_tests/reporter/bin/reporter_MPI
The reporter_MPI utility simply reports the hostname and affinitization 
for each MPI process, and is what I use to verify that the job is 
distributed to allocated nodes and on them with the affinitization 
expected.  Typical output is
        , 

To adapt the above batch script to use OpenMPI, I modify it to
#BSUB -J "task_geometry"
#BSUB -n 9
#BSUB -R "span[ptile=3]"
#BSUB -m "p10a30 p10a33 p10a35 p10a55 p10a58"
#BSUB -R "affinity[core]"
#BSUB -e "task_geometry.stderr.%J"
#BSUB -o "task_geometry.stdout.%J"
#BSUB -q "normal"
#BSUB -M "800"
#BSUB -R "rusage[mem=800]"
#BSUB -x

export PATH=/usr/local/OpenMPI/1.10.2/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/OpenMPI/1.10.2/lib:${PATH}

export LSB_PJL_TASK_GEOMETRY="{(5)(4,3)(2,1,0)}"

echo "=== LSB_DJOB_HOSTFILE ==="
cat ${LSB_DJOB_HOSTFILE}
echo "=== LSB_AFFINITY_HOSTFILE ==="
cat ${LSB_AFFINITY_HOSTFILE}
echo "=== LSB_DJOB_RANKFILE ==="
cat ${LSB_DJOB_RANKFILE}
echo "========================="

ldd /gpfs/gpfs_stage1/parpia/OpenMPI_tests/reporter/bin/reporter_MPI

mpirun /gpfs/gpfs_stage1/parpia/OpenMPI_tests/reporter/bin/reporter_MPI
There are additional lines of scripting that I have inserted to help with 
debugging this failing job.  Here are the output files from the job:
        , 
If I change the last line of the immediately above job script to
        mpirun -bind-to core:overload-allowed 
/gpfs/gpfs_stage1/parpia/OpenMPI_tests/reporter/bin/reporter_MPI
the job runs through, but the host selection and affinization is 
completely wrong (you can extract the relevant information with grep "can 
be sched" *.stdout.* | sort -n -k 9):
        , 
OpenMPI 1.10.2 was built using this script:
 
It was installed with
make install
executed from the top if the build tree.  Here
 
is the output of
ompi_info --all

Regards,

Farid Parpia          IBM Corporation: 710-2-RF28, 2455 South Road, 
Poughkeepsie, NY 12601, USA; Telephone: (845) 433-8420 = Tie Line 293-8420

Attachment: task_geometry.stdout.43915.gz
Description: Binary data

Attachment: task_geometry.stderr.43915.gz
Description: Binary data

Attachment: task_geometry.stderr.43918.gz
Description: Binary data

Attachment: task_geometry.stdout.43918.gz
Description: Binary data

Attachment: task_geometry.stderr.43953.gz
Description: Binary data

Attachment: task_geometry.stdout.43953.gz
Description: Binary data

Attachment: build_OpenMPI.sh
Description: Binary data

Attachment: ompi_info--all.gz
Description: Binary data

Reply via email to