Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/15/2011 03:54 PM, Jeff Squyres wrote: Which Linux / OFED are you using? I've seen this with the following: RH 4.6 / OFED 1.3.6 CentOS 5.2 / OFED 1.3.6 SLES 10.1 / OFED 1.3.6 I know the above is pretty darn old but it would be nice to know what is the oldest s/w we can be using? Note t

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Jeff Squyres
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: > I've seen this with the following: > > RH 4.6 / OFED 1.3.6 Errr... did you look at http://www.open-mpi.org/community/lists/devel/2011/03/9068.php? > CentOS 5.2 / OFED 1.3.6 > SLES 10.1 / OFED 1.3.6 > > I know the above is pretty darn old bu

Re: [OMPI devel] Old Linux kernels

2011-03-16 Thread Jeff Squyres (jsquyres)
Is there a version in a pthreads header file that can be checked? You're right that I am currently checking Linux kernel version, not pthread version. Note that this is *only* in cross-compiling environments; in non cross compiling situations, we actually test the behavior to see if threads have

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 06:21 AM, Jeff Squyres wrote: On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: I've seen this with the following: RH 4.6 / OFED 1.3.6 Errr... did you look at http://www.open-mpi.org/community/lists/devel/2011/03/9068.php? Yes I did, and I will be talking with my group about thi

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Jeff Squyres (jsquyres)
K. When Ralph and I removed that code, it was on he educated guess that no one was using it (because it hasn't compiled right in a while). If we were wrong, it can be put back, but someone will need to update it and Ralph and I don't have access to machines to test that behavior. Sent from my

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 06:38 AM, Jeff Squyres (jsquyres) wrote: K. When Ralph and I removed that code, it was on he educated guess that no one was using it (because it hasn't compiled right in a while). If we were wrong, it can be put back, but someone will need to update it and Ralph and I don't have a

Re: [OMPI devel] Old Linux kernels

2011-03-16 Thread Paul H. Hargrove
I have looked before for symbols to distinguish LinuxThreads from NPTL, but I was not successful in finding anything. I don't recall if I examined headers for differences, but the implementations are binary compatible by design, making differences intentionally minimal. I suppose one can grep

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 06:34 AM, Terry Dontje wrote: On 03/16/2011 06:21 AM, Jeff Squyres wrote: On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: I've seen this with the following: RH 4.6 / OFED 1.3.6 Errr... did you look athttp://www.open-mpi.org/community/lists/devel/2011/03/9068.php? Yes I did, a

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Jeff Squyres
On Mar 16, 2011, at 6:50 AM, Terry Dontje wrote: >> K. When Ralph and I removed that code, it was on he educated guess that no >> one was using it (because it hasn't compiled right in a while). If we were >> wrong, it can be put back, but someone will need to update it and Ralph and >> I don't

Re: [OMPI devel] Old Linux kernels

2011-03-16 Thread Jeff Squyres
On Mar 16, 2011, at 7:48 AM, Paul H. Hargrove wrote: > I have looked before for symbols to distinguish LinuxThreads from NPTL, but I > was not successful in finding anything. I don't recall if I examined headers > for differences, but the implementations are binary compatible by design, > maki

[OMPI devel] 1.5.3rc2 posted

2011-03-16 Thread Jeff Squyres
rc1 was borked; we fixed it in rc2. This will likely be the last rc. http://www.open-mpi.org/software/ompi/v1.5/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread George Bosilca
The trunk is indeed broken. The reason is, as Terry pointed out, the inclusion of infiniband/mad.h introduced by r24507 (https://svn.open-mpi.org/trac/ompi/changeset/24507). As long as OFED 1.4 is available, it will compile independent of the version of the kernel, libpthread, moon position or

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Jeff Squyres
Ya, you're right -- I'm looking at my MTT right now and I see lots of broken installs. But it works if I compile manually. Weird. Mellanox -- please fix ASAP, or we'll likely back our r24507 so that people can keep working... On Mar 16, 2011, at 11:58 AM, George Bosilca wrote: > The trunk i

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 12:00 PM, Jeff Squyres wrote: Ya, you're right -- I'm looking at my MTT right now and I see lots of broken installs. But it works if I compile manually. Weird. So when I saw your MTT results it was not finding a header file as opposed to the problem I was incurring which was a r

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Mike Dubman
sorry about that, we find a better way to resolve it later. fix commited. On Wed, Mar 16, 2011 at 6:00 PM, Jeff Squyres wrote: > Ya, you're right -- I'm looking at my MTT right now and I see lots of > broken installs. > > But it works if I compile manually. Weird. > > Mellanox -- please fix ASA

[OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Damien Guinier
Hi all From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The "grpcomm:hier" module is important because, "srun" launch protocol can't use any other "grpcomm" module. You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you create a ring(like: IMB sendrecv

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Ralph Castain
I suspect something else is wrong - the grpcomm system never has any visibility as to what data goes into the modex, or how that data is used. In other words, if the tcp btl isn't providing adequate info, then it would fail regardless of which grpcomm module was in use. So your statement about t

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread George Bosilca
Actually I think that Damien analysis is correct. On a 8 nodes cluster mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv does work, while mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 Sendrecv doesn't. As soon as I remove the

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Ralph Castain
Very strange - I'll bet it is something in the hier modex algo that is losing the info about where the data came from. I'll take a look. On Mar 16, 2011, at 2:25 PM, George Bosilca wrote: > Actually I think that Damien analysis is correct. On a 8 nodes cluster > > mpirun -npernode 1 -np 4 --mc

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Ralph Castain
In looking at this, perhaps you can help me understand something. The grpcomm hier modex is the same regardless of what info is given to it. So how is it that this works fine with IB, but not for the TCP btl? Are you relying on something in the modex to track data identity, but the IB btl doesn'

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread George Bosilca
I just checked and IB does work correctly. But then I remembered that IB is different, the connection are peer based, so they don't happens during the modex exchange. The data is exchanged over RML messages, but outside the modex. george. On Mar 16, 2011, at 17:28 , Ralph Castain wrote: > In

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Jeff Squyres
On Mar 16, 2011, at 5:37 PM, George Bosilca wrote: > I just checked and IB does work correctly. But then I remembered that IB is > different, the connection are peer based, so they don't happens during the > modex exchange. The data is exchanged over RML messages, but outside the > modex. Not

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Ralph Castain
I believe I see the problem - and why it wouldn't show up for IB. It looks like the hier module passes an incorrect flag to the modex unpack function, which causes that function to place the modex values as attributes assigned to the node instead of a process, rather than placing the values into

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Ralph Castain
Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! Went unnoticed because the folks at the Labs always use IB. On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote: > I believe I see the problem - and why it wouldn't show up for IB. It looks > like the hier modu