It should have been looking in the same place - check to see where you installed the inifiniband support. Is "verbs.h" under your /usr/include?
In looking at the code, the 1.6 series searched for verbs.h in /usr/include/infiniband. The 1.7 series also does (though it doesn't look quite right to me), but it wouldn't hurt to add it yourself --with-verbs=/usr/include/infiniband --with-verbs-libdir=/usr/lib64/infiniband or something like that On Mar 1, 2014, at 11:56 PM, Beichuan Yan <beichuan....@colorado.edu> wrote: > Ralph and Gus, > > 1. Thank you for your suggestion. I built Open MPI 1.6.5 with the following > command: > ./configure > --prefix=/work4/projects/openmpi/openmpi-1.6.5-gcc-compilers-4.7.3 > --with-tm=/opt/pbs/default --with-openib= --with-openib-libdir=/usr/lib64 > > In my job script, I need to specify the IB subnet like this: > TCP="--mca btl_tcp_if_include 10.148.0.0/16" > mpirun $TCP -np 64 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt > > Then my job can get initialized and run correctly each time! > > 2. However, to build Open MPI 1.7.4 with another command (in order to > test/compare shared-memory performance of Open MPI): > ./configure > --prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3 > --with-tm=/opt/pbs/default --with-verbs= --with-verbs-libdir=/usr/lib64 > > It gets error as follows: > ============================================================================ > == Modular Component Architecture (MCA) setup > ============================================================================ > checking for subdir args... > '--prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3' > '--with-tm=/opt/pbs/default' '--with-verbs=' '--with-verbs-libdir=/usr/lib64' > 'CC=gcc' 'CXX=g++' > checking --with-verbs value... simple ok (unspecified) > checking --with-verbs-libdir value... sanity check ok (/usr/lib64) > configure: WARNING: Could not find verbs.h in the usual locations under > configure: error: Cannot continue > > Our system is Red Hat 6.4. Do we need to install more packages of Infiniband? > Can you please advise? > > Thanks, > Beichuan Yan > > > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa > Sent: Friday, February 28, 2014 15:59 > To: Open MPI Users > Subject: Re: [OMPI users] OpenMPI job initializing problem > > HI Beichuan > > To add to what Ralph said, > the RHEL OpenMPI package probably wasn't built with with PBS Pro support > either. > Besides, OMPI 1.5.4 (RHEL version) is old. > > ** > > You will save yourself time and grief if you read the installation FAQs, > before you install from the source tarball: > > http://www.open-mpi.org/faq/?category=building > > However, as Ralph said, that is your best bet, and it is quite easy to get > right. > > > See this FAQ on how to build with PBS Pro support: > > http://www.open-mpi.org/faq/?category=building#build-rte-tm > > And this one on how to build with Infiniband support: > > http://www.open-mpi.org/faq/?category=building#build-p2p > > Here is how to select the installation directory (--prefix): > > http://www.open-mpi.org/faq/?category=building#easy-build > > Here is how to select the compilers (gcc,g++, and gfortran are fine): > > http://www.open-mpi.org/faq/?category=building#build-compilers > > I hope this helps, > Gus Correa > > On 02/28/2014 12:36 PM, Ralph Castain wrote: >> Almost certainly, the redhat package wasn't built with matching >> infiniband support and so we aren't picking it up. I'd suggest >> downloading the latest 1.7.4 or 1.7.5 nightly tarball, or even the >> latest 1.6 tarball if you want the stable release, and build it >> yourself so you *know* it was built for your system. >> >> >> On Feb 28, 2014, at 9:20 AM, Beichuan Yan <beichuan....@colorado.edu >> <mailto:beichuan....@colorado.edu>> wrote: >> >>> Hi there, >>> I am running jobs on clusters with Infiniband connection. They >>> installed OpenMPI v1.5.4 via REDHAT 6 yum package). My problem is >>> that although my jobs gets queued and started by PBS PRO quickly, >>> most of the time they don't really run (occasionally they really run) >>> and give error info like this (even though there are a lot of CPU/IB >>> resource >>> available): >>> [r2i6n7][[25564,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_com >>> plete_connect] >>> connect() to 192.168.159.156 failed: Connection refused (111) And >>> even though when a job gets started and runs well, it prompts this >>> error: >>> --------------------------------------------------------------------- >>> ----- >>> WARNING: There was an error initializing an OpenFabrics device. >>> Local host: r1i2n6 >>> Local device: mlx4_0 >>> --------------------------------------------------------------------- >>> ----- 1. Here is the info from one of the compute nodes: >>> -bash-4.1$ /sbin/ifconfig >>> eth0 Link encap:Ethernet HWaddr 8C:89:A5:E3:D2:96 inet >>> addr:192.168.159.205 Bcast:192.168.159.255 Mask:255.255.255.0 >>> inet6 addr: fe80::8e89:a5ff:fee3:d296/64 Scope:Link UP BROADCAST >>> RUNNING MULTICAST MTU:1500 Metric:1 RX packets:48879864 errors:0 >>> dropped:0 overruns:17 frame:0 TX packets:39286060 errors:0 dropped:0 >>> overruns:0 carrier:0 >>> collisions:0 txqueuelen:1000 >>> RX bytes:54771093645 (51.0 GiB) TX bytes:37512462596 (34.9 GiB) >>> Memory:dfc00000-dfc20000 >>> Ifconfig uses the ioctl access method to get the full address >>> information, which limits hardware addresses to 8 bytes. >>> Because Infiniband address has 20 bytes, only the first 8 bytes are >>> displayed correctly. >>> Ifconfig is obsolete! For replacement check ip. >>> ib0 Link encap:InfiniBand HWaddr >>> 80:00:00:48:FE:C0:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>> inet addr:10.148.0.114 Bcast:10.148.255.255 Mask:255.255.0.0 >>> inet6 addr: fe80::202:c903:fb:3489/64 Scope:Link UP BROADCAST RUNNING >>> MULTICAST MTU:65520 Metric:1 RX packets:43807414 errors:0 dropped:0 >>> overruns:0 frame:0 TX packets:10534050 errors:0 dropped:24 overruns:0 >>> carrier:0 >>> collisions:0 txqueuelen:256 >>> RX bytes:47824448125 (44.5 GiB) TX bytes:44764010514 (41.6 GiB) lo >>> Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 >>> inet6 addr: ::1/128 Scope:Host >>> UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:17292 errors:0 >>> dropped:0 overruns:0 frame:0 TX packets:17292 errors:0 dropped:0 >>> overruns:0 carrier:0 >>> collisions:0 txqueuelen:0 >>> RX bytes:1492453 (1.4 MiB) TX bytes:1492453 (1.4 MiB) -bash-4.1$ >>> chkconfig --list iptables iptables 0:off 1:off 2:on 3:on 4:on 5:on >>> 6:off 2. I tried various parameters below but none of them can assure >>> my jobs get initialized and run: >>> #TCP="--mca btl ^tcp" >>> #TCP="--mca btl self,openib" >>> #TCP="--mca btl_tcp_if_exclude lo" >>> #TCP="--mca btl_tcp_if_include eth0" >>> #TCP="--mca btl_tcp_if_include eth0, ib0" >>> #TCP="--mca btl_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8 --mca >>> oob_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8" >>> #TCP="--mca btl_tcp_if_include 10.148.0.0/16" >>> mpirun $TCP -hostfile $PBS_NODEFILE -np 8 ./paraEllip3d input.txt 3. >>> Then I turned to Intel MPI, which surprisingly starts and runs my job >>> correctly each time (though it is a little slower than OpenMPI, maybe >>> 15% slower, but it works each time). >>> Can you please advise? Many thanks. >>> Sincerely, >>> Beichuan Yan >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users