On 03/15/2011 03:54 PM, Jeff Squyres wrote:
Which Linux / OFED are you using?
I've seen this with the following:
RH 4.6 / OFED 1.3.6
CentOS 5.2 / OFED 1.3.6
SLES 10.1 / OFED 1.3.6
I know the above is pretty darn old but it would be nice to know what is
the oldest s/w we can be using? Note t
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
> I've seen this with the following:
>
> RH 4.6 / OFED 1.3.6
Errr... did you look at
http://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
> CentOS 5.2 / OFED 1.3.6
> SLES 10.1 / OFED 1.3.6
>
> I know the above is pretty darn old bu
Is there a version in a pthreads header file that can be checked?
You're right that I am currently checking Linux kernel version, not pthread
version. Note that this is *only* in cross-compiling environments; in non cross
compiling situations, we actually test the behavior to see if threads have
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look at
http://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes I did, and I will be talking with my group about thi
K. When Ralph and I removed that code, it was on he educated guess that no one
was using it (because it hasn't compiled right in a while). If we were wrong,
it can be put back, but someone will need to update it and Ralph and I don't
have access to machines to test that behavior.
Sent from my
On 03/16/2011 06:38 AM, Jeff Squyres (jsquyres) wrote:
K. When Ralph and I removed that code, it was on he educated guess
that no one was using it (because it hasn't compiled right in a
while). If we were wrong, it can be put back, but someone will need to
update it and Ralph and I don't have a
I have looked before for symbols to distinguish LinuxThreads from NPTL,
but I was not successful in finding anything. I don't recall if I
examined headers for differences, but the implementations are binary
compatible by design, making differences intentionally minimal.
I suppose one can grep
On 03/16/2011 06:34 AM, Terry Dontje wrote:
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look
athttp://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes I did, a
On Mar 16, 2011, at 6:50 AM, Terry Dontje wrote:
>> K. When Ralph and I removed that code, it was on he educated guess that no
>> one was using it (because it hasn't compiled right in a while). If we were
>> wrong, it can be put back, but someone will need to update it and Ralph and
>> I don't
On Mar 16, 2011, at 7:48 AM, Paul H. Hargrove wrote:
> I have looked before for symbols to distinguish LinuxThreads from NPTL, but I
> was not successful in finding anything. I don't recall if I examined headers
> for differences, but the implementations are binary compatible by design,
> maki
rc1 was borked; we fixed it in rc2. This will likely be the last rc.
http://www.open-mpi.org/software/ompi/v1.5/
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
The trunk is indeed broken. The reason is, as Terry pointed out, the inclusion
of infiniband/mad.h introduced by r24507
(https://svn.open-mpi.org/trac/ompi/changeset/24507). As long as OFED 1.4 is
available, it will compile independent of the version of the kernel,
libpthread, moon position or
Ya, you're right -- I'm looking at my MTT right now and I see lots of broken
installs.
But it works if I compile manually. Weird.
Mellanox -- please fix ASAP, or we'll likely back our r24507 so that people can
keep working...
On Mar 16, 2011, at 11:58 AM, George Bosilca wrote:
> The trunk i
On 03/16/2011 12:00 PM, Jeff Squyres wrote:
Ya, you're right -- I'm looking at my MTT right now and I see lots of broken
installs.
But it works if I compile manually. Weird.
So when I saw your MTT results it was not finding a header file as
opposed to the problem I was incurring which was a r
sorry about that, we find a better way to resolve it later.
fix commited.
On Wed, Mar 16, 2011 at 6:00 PM, Jeff Squyres wrote:
> Ya, you're right -- I'm looking at my MTT right now and I see lots of
> broken installs.
>
> But it works if I compile manually. Weird.
>
> Mellanox -- please fix ASA
Hi all
From my test, it is impossible to use "btl:tcp" with "grpcomm:hier".
The "grpcomm:hier" module is important because, "srun" launch protocol
can't use any other "grpcomm" module.
You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when
you create a ring(like: IMB sendrecv
I suspect something else is wrong - the grpcomm system never has any visibility
as to what data goes into the modex, or how that data is used. In other words,
if the tcp btl isn't providing adequate info, then it would fail regardless of
which grpcomm module was in use. So your statement about t
Actually I think that Damien analysis is correct. On a 8 nodes cluster
mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1
Sendrecv
does work, while
mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1
Sendrecv
doesn't. As soon as I remove the
Very strange - I'll bet it is something in the hier modex algo that is losing
the info about where the data came from. I'll take a look.
On Mar 16, 2011, at 2:25 PM, George Bosilca wrote:
> Actually I think that Damien analysis is correct. On a 8 nodes cluster
>
> mpirun -npernode 1 -np 4 --mc
In looking at this, perhaps you can help me understand something. The grpcomm
hier modex is the same regardless of what info is given to it. So how is it
that this works fine with IB, but not for the TCP btl? Are you relying on
something in the modex to track data identity, but the IB btl doesn'
I just checked and IB does work correctly. But then I remembered that IB is
different, the connection are peer based, so they don't happens during the
modex exchange. The data is exchanged over RML messages, but outside the modex.
george.
On Mar 16, 2011, at 17:28 , Ralph Castain wrote:
> In
On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:
> I just checked and IB does work correctly. But then I remembered that IB is
> different, the connection are peer based, so they don't happens during the
> modex exchange. The data is exchanged over RML messages, but outside the
> modex.
Not
I believe I see the problem - and why it wouldn't show up for IB. It looks like
the hier module passes an incorrect flag to the modex unpack function, which
causes that function to place the modex values as attributes assigned to the
node instead of a process, rather than placing the values into
Okay, I fixed this in r24536.
Sorry for the problem, Damien - thanks for catching it! Went unnoticed because
the folks at the Labs always use IB.
On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:
> I believe I see the problem - and why it wouldn't show up for IB. It looks
> like the hier modu
24 matches
Mail list logo