Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Gilles Gouaillardet
Artem, thanks for the feedback. i commited the patch to the trunk (r31922) as i indicated in the commit log, this patch is likely suboptimal and has room for improvement. Jeff commented about the usnic related issue, so i will wait for a fix from the Cisco folks. Cheers, Gilles On Sun,

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
I did check this for SLURM 2.6.5 2014-06-01 20:31 GMT+07:00 Ralph Castain : > That really wasn't necessary - I had tested it under PMI-1 and it was > fine. Artem: did you test it, or just assume it wasn't right? > > > On May 31, 2014, at 11:47 PM, Artem Polyakov

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Jeff Squyres (jsquyres)
This should also be fixed when we stop firing up the usnic connectivity checker when there are no usNICs present. On Jun 1, 2014, at 9:12 AM, Artem Polyakov wrote: > > 2014-06-01 14:24 GMT+07:00 Gilles Gouaillardet > : > export

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Ralph Castain
That really wasn't necessary - I had tested it under PMI-1 and it was fine. Artem: did you test it, or just assume it wasn't right? On May 31, 2014, at 11:47 PM, Artem Polyakov wrote: > Thank you, Mike! > > > 2014-06-01 13:43 GMT+07:00 Mike Dubman

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Jeff Squyres (jsquyres)
Ah -- I missed the attachment; I only looked at your email text. I'll have a look now... auto-failure: Ah, I found this late last week and sent a fix around internally for review. Should have something soon for trunk/v1.8. If you care: we accidentally still fire up the usnic connectivity

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
2014-06-01 14:24 GMT+07:00 Gilles Gouaillardet < gilles.gouaillar...@gmail.com>: > export OMPI_MCA_btl_openib_use_eager_rdma=0 Gilles, I test your approach. Both: a) export OMPI_MCA_btl_openib_use_eager_rdma=0 b) applying your patch and run without "export OMPI_MCA_btl_openib_use_eager_rdma=0"

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
Hello, Jeff. Please, check attached tar ("auto-failure" dir). There I've seen the following message: -- An internal error has occurred in the Open MPI usNIC BTL. This is highly unusual and shouldn't happen. It suggests

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Jeff Squyres (jsquyres)
Just to be clear: it looks like you haven't seen any errors from the usnic BTL, right? (the Cisco VIC uses the usnic BTL only -- it does not use the openib BTL) On Jun 1, 2014, at 2:57 AM, Artem Polyakov wrote: > Hello, while testing new PMI implementation I faced a

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
I think I can do that. воскресенье, 1 июня 2014 г. пользователь Gilles Gouaillardet написал: > Artem, > > this looks like the issue initially reported by Rolf > http://www.open-mpi.org/community/lists/devel/2014/05/14836.php > > in http://www.open-mpi.org/community/lists/devel/2014/05/14839.php

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Gilles Gouaillardet
Artem, this looks like the issue initially reported by Rolf http://www.open-mpi.org/community/lists/devel/2014/05/14836.php in http://www.open-mpi.org/community/lists/devel/2014/05/14839.php i posted a patch and a workaround : export OMPI_MCA_btl_openib_use_eager_rdma=0 i do not recall i

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
P.S. 1. Just to make sure I tried the same program with old ompi-1.6.5 that is installed on our cluster without any problem. 2. My testing program just sends data through the ring. 2014-06-01 13:57 GMT+07:00 Artem Polyakov : > Hello, while testing new PMI implementation I

[OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
Hello, while testing new PMI implementation I faced a problem with OpenIB and/or usNIC support. The cluster I use is build on Mellanox QDR. We don't use Cisco hardware, thus no Cisco Virtual Interface Card. To exclude possibility of new PMI code influence I used mpirun to launch the job. Slurm job

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
Thank you, Mike! 2014-06-01 13:43 GMT+07:00 Mike Dubman : > applied here: https://svn.open-mpi.org/trac/ompi/changeset/31909 > > > On Sun, Jun 1, 2014 at 9:15 AM, Artem Polyakov wrote: > >> Hi, all. >> >> Ralph commited the code that was developed

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Mike Dubman
applied here: https://svn.open-mpi.org/trac/ompi/changeset/31909 On Sun, Jun 1, 2014 at 9:15 AM, Artem Polyakov wrote: > Hi, all. > > Ralph commited the code that was developed for this RFC (r31908). This > commit will brake PMI1 support. In case of hurry - apply attached

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
Hi, all. Ralph commited the code that was developed for this RFC (r31908). This commit will brake PMI1 support. In case of hurry - apply attached patch. Ralph will apply it once he'll be online. I have no rights for that yet. 2014-05-19 21:18 GMT+07:00 Ralph Castain : >