I took advantage of the fact that Bill's phone number is in his signature and 
gave him a call (gasp! Talk to someone from the interwebs -- what craziness is 
that?!).

The real issue here is that Open MPI's use of verbs in v1.10.2 pre-dates the 
use of the rdma-core packaging.  Various header file and other changes were 
made in the transition from libiverbs to rdma-core, the Open MPI v1.10.2 simply 
doesn't handle them correctly.  Later versions in the v1.10.x series fix all 
these issues such that both pre-rdma-core libibverbs and post-rdma-core 
libibverbs are handled properly.

Compounding this was the fact that we had another bug in v1.10.2 that it wasn't 
possible to fully disable the common/usnic component.  Sad panda.

Unfortunately, the only way to move forward is to either apply a patch to 
v1.10.2 (in this case, just to disable all the usNIC stuff, since this user is 
not using usNIC at all), move forward to v1.10.7, or move forward to the latest 
Open MPI (both v3.0.1 and v3.1.0 are literally immanently about to be released).



> On Feb 28, 2018, at 2:41 PM, William T Jones <w.t.jo...@nasa.gov> wrote:
> 
> Thanks for the suggestions.
> 
> 
> On 02/28/2018 12:10 PM, Jeff Squyres (jsquyres) wrote:
>> Oops; it looks like there's 2 chunks of usNIC code in the 1.10.x code base, 
>> and --without-usnic only disables one of them.
>> I do believe we fixed that in a later 1.10.x release -- I am guessing you 
>> don't want to upgrade to v3.0.x for compatibility/testing reasons, but do 
>> you think you could move to a later v1.10.x release (which should be just 
>> bug fixes compared to 1.10.2)?
>> If you can upgrade to v1.10.7, this should work:
>> ./configure --without-usnic --without-verbs-usnic ...
> 
> This works.  But will be tough because it will require lots of re-validation 
> work with the codes that depend on OpenMPI even though it is only a patch 
> level increment.
> 
>> Sidenote: it's actually also possible that v1.10.7 will either correctly 
>> ignore or correctly compile usNIC on your system (without any extra command 
>> line options) -- we may have fixed that bug by v1.10.7; I honestly don't 
>> remember offhand.
>> If you can't move to v1.10.7, I think you should be able to use the 
>> following with v1.10.2 too disable the BTL usNIC component and the 
>> verbs_usnic common component:
>> ./configure --without-usnic --enable-mca-no-build=common-verbs_usnic ...
> 
> Sadly this does not work.  Linker fails with undefined reference to 
> `ompi_common_verbs_usnic_register_fake_drivers'.
> 
>>> On Feb 28, 2018, at 11:55 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
>>> 
>>> Unfortunately, that does not work.
>>> 
>>> % ./configure --enable-static \
>>>              --with-tm=/usr/local/pkgs/PBSPro_64 \
>>>              --enable-mpi-thread-multiple \
>>>              --with-verbs=/usr \
>>>              --without-usnic \
>>>              --enable-mpi-cxx \
>>>              FC=ifort \
>>>              F77=ifort \
>>>              CC=icc \
>>>              CXX=icpc \
>>>              CFLAGS="-O3 -ip" \
>>>              FCFLAGS="-O3 -ip" \
>>>              LIBS="-lcrypto -lpthread"
>>> 
>>> ...
>>> Making all in mca/common/verbs_usnic
>>> make[2]: Entering directory 
>>> `/misc/home2/wtjones1/GIT/fun3d/misc/module-builder/k/openmpi/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
>>>  CC       libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
>>> common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field "ops"
>>>      .ops = {
>>>       ^
>>> 
>>> common_verbs_usnic_fake.c(89): warning #266: function "ibv_read_sysfs_file" 
>>> declared implicitly
>>>      if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
>>>          ^
>>> 
>>> common_verbs_usnic_fake.c(133): warning #266: function 
>>> "ibv_register_driver" declared implicitly
>>>          ibv_register_driver("usnic_verbs", fake_driver_init);
>>>          ^
>>> 
>>> compilation aborted for common_verbs_usnic_fake.c (code 2)
>>> 
>>> 
>>> On 02/28/2018 11:10 AM, r...@open-mpi.org wrote:
>>>> Not unless you have a USNIC card in your machine!
>>>>> On Feb 28, 2018, at 8:08 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
>>>>> 
>>>>> Thank you!
>>>>> 
>>>>> Will that have any adverse side effects?
>>>>> Performance penalties?
>>>>> 
>>>>> On 02/28/2018 10:57 AM, r...@open-mpi.org wrote:
>>>>>> Add --without-usnic
>>>>>>> On Feb 28, 2018, at 7:50 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
>>>>>>> 
>>>>>>> I realize that OpenMPI 1.10.2 is quite old, however, for compatibility I
>>>>>>> am attempting to compile it after a system upgrade to CentOS 7.
>>>>>>> 
>>>>>>> This system does include infiniband and I have configured as follows
>>>>>>> using Intel 2017.2.174 compilers:
>>>>>>> 
>>>>>>> % ./configure --enable-static \
>>>>>>>              --with-tm=/usr/local/pkgs/PBSPro_64 \
>>>>>>>              --enable-mpi-thread-multiple \
>>>>>>>              --with-verbs=/usr \
>>>>>>>              --enable-mpi-cxx \
>>>>>>>              FC=ifort \
>>>>>>>              F77=ifort \
>>>>>>>              CC=icc \
>>>>>>>              CXX=icpc \
>>>>>>>              CFLAGS="-O3 -ip" \
>>>>>>>              FCFLAGS="-O3 -ip" \
>>>>>>>              LIBS=-lcrypto -lpthread
>>>>>>> 
>>>>>>> However, when I compile I get the following error:
>>>>>>> 
>>>>>>>  ...
>>>>>>>  Making all in mca/common/verbs_usnic
>>>>>>>  make[2]: Entering directory
>>>>>>> `/usr/src/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
>>>>>>>    CC       libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
>>>>>>>  common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field
>>>>>>> "ops"
>>>>>>>        .ops = {
>>>>>>>         ^
>>>>>>> 
>>>>>>>  common_verbs_usnic_fake.c(89): warning #266: function
>>>>>>> "ibv_read_sysfs_file" declared implicitly
>>>>>>>        if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
>>>>>>>            ^
>>>>>>> 
>>>>>>>  common_verbs_usnic_fake.c(133): warning #266: function
>>>>>>> "ibv_register_driver" declared implicitly
>>>>>>>            ibv_register_driver("usnic_verbs", fake_driver_init);
>>>>>>>            ^
>>>>>>> 
>>>>>>>  compilation aborted for common_verbs_usnic_fake.c (code 2)
>>>>>>> 
>>>>>>> 
>>>>>>> Unfortunately, my /usr/include/infiniband/verbs.h file defines the
>>>>>>> "ibv_device" structure but does not include "ops" member.  Instead the
>>>>>>> structure is defined as follows:
>>>>>>> 
>>>>>>>  /* Obsolete, never used, do not touch */
>>>>>>>  struct _ibv_device_ops {
>>>>>>>          struct ibv_context *    (*_dummy1)(struct ibv_device *device,
>>>>>>> int cmd_fd);
>>>>>>>          void                    (*_dummy2)(struct ibv_context 
>>>>>>> *context);
>>>>>>>  };
>>>>>>> 
>>>>>>>  enum {
>>>>>>>          IBV_SYSFS_NAME_MAX      = 64,
>>>>>>>          IBV_SYSFS_PATH_MAX      = 256
>>>>>>>  };
>>>>>>> 
>>>>>>>  struct ibv_device {
>>>>>>>          struct _ibv_device_ops  _ops;
>>>>>>>          enum ibv_node_type      node_type;
>>>>>>>          enum ibv_transport_type transport_type;
>>>>>>>          /* Name of underlying kernel IB device, eg "mthca0" */
>>>>>>>          char                    name[IBV_SYSFS_NAME_MAX];
>>>>>>>          /* Name of uverbs device, eg "uverbs0" */
>>>>>>>          char                    dev_name[IBV_SYSFS_NAME_MAX];
>>>>>>>          /* Path to infiniband_verbs class device in sysfs */
>>>>>>>          char                    dev_path[IBV_SYSFS_PATH_MAX];
>>>>>>>          /* Path to infiniband class device in sysfs */
>>>>>>>          char                    ibdev_path[IBV_SYSFS_PATH_MAX];
>>>>>>>  };
>>>>>>> 
>>>>>>> 
>>>>>>> OpenMPI was previously compiled successfully under CentOS 6 and every
>>>>>>> indication is that the /usr/include/infiniband/verbs.h was defined
>>>>>>> similarly (again without the "ops" member).
>>>>>>> 
>>>>>>> Is it possible that there is a configure option that prevents this 
>>>>>>> source from being included in the build?
>>>>>>> 
>>>>>>> Any help is appreciated,
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> 
>>>>>>>    Bill Jones                                       w.t.jo...@nasa.gov
>>>>>>>    Mail Stop 128                     Computational AeroSciences Branch
>>>>>>>    15 Langley Boulevard                           Research Directorate
>>>>>>>    NASA Langley Research Center               Building 1268, Room 1044
>>>>>>>    Hampton, VA  23681-2199                       Phone +1 757 864-5318
>>>>>>>                                                    Fax +1 757 864-8816
>>>>>>>                                             http://fun3d.larc.nasa.gov
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org
>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org
>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>> 
>>>>> -- 
>>>>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> 
>>>>>    Bill Jones                                       w.t.jo...@nasa.gov
>>>>>    Mail Stop 128                     Computational AeroSciences Branch
>>>>>    15 Langley Boulevard                           Research Directorate
>>>>>    NASA Langley Research Center               Building 1268, Room 1044
>>>>>    Hampton, VA  23681-2199                       Phone +1 757 864-5318
>>>>>                                                    Fax +1 757 864-8816
>>>>>                                             http://fun3d.larc.nasa.gov
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> -- 
>>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> 
>>>    Bill Jones                                       w.t.jo...@nasa.gov
>>>    Mail Stop 128                     Computational AeroSciences Branch
>>>    15 Langley Boulevard                           Research Directorate
>>>    NASA Langley Research Center               Building 1268, Room 1044
>>>    Hampton, VA  23681-2199                       Phone +1 757 864-5318
>>>                                                    Fax +1 757 864-8816
>>>                                             http://fun3d.larc.nasa.gov
>>> <config.log.gz>_______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> -- 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> 
>    Bill Jones                                       w.t.jo...@nasa.gov
>    Mail Stop 128                     Computational AeroSciences Branch
>    15 Langley Boulevard                           Research Directorate
>    NASA Langley Research Center               Building 1268, Room 1044
>    Hampton, VA  23681-2199                       Phone +1 757 864-5318
>                                                    Fax +1 757 864-8816
>                                             http://fun3d.larc.nasa.gov


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to