Hi Josh
It was my mistake. The status of error generating node is pasted below
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0018:8b90:97fe:94fe
base lid:0x0
sm lid: 0x0
state: 1: DOWN
phys state:
Dear Pasha
The ibstatus is not of two different machines it is of the same machine.
There are two infiband ports showing up on all nodes. I checked on all the
nodes that one of the port in always in INIT status and other one active.
Now please see below the ibstatus of the problem causing node
(co
Hyperthreading is pretty great for non-HPC applications, which is why Intel
makes it. But hyperthreading *generally* does not help HPC application
performance. You're basically halving several on-chip resources / queues /
pipelines, and that can hurt for performance-hungry HPC applications.
T
Ralph,
The 32 slot systems/nodes I'm running my openMPI test code on only have
16 physical cores, the rest of the slots are hyperthreads. I've done some more
testing and noticed that if I limit the number of slots per node to 8
(via -npernode 8) everything works and 8 slots are used from each syst
Okay, the problem is that the connection back to mpirun isn't getting thru. We
are trying on the 10.0.251.53 address - is that blocked, or should we be using
something else? If so, you might want to direct us by adding "-mca
oob_tcp_if_include foo", where foo is the interface you want us to use
Thanks - I revised the module so it quotes all params and envar values, and
CMRd it for 1.8.2 (added you on the ticket)
On Jul 21, 2014, at 2:42 AM, Dirk Schubert wrote:
> Hello Ralph,
>
> thanks for your answer.
>
> > I can look to see if there is something generic we can do (perhaps
> > en
I'm not aware of any way to tell using ompi_info, I'm afraid. I'd have to
ponder a bit as to how we could do so since it's a link to a library down below
the one we directly use.
On Jul 21, 2014, at 3:00 PM, Blosch, Edwin L wrote:
> In making the leap from 1.6 to 1.8, how can I check whether
Certainly possible, though I haven't heard of anyone doing so - not sure of the
motivation, but nothing prevents it. You would need to build OMPI
--with-pmi=, but that should be all that is required
On Jul 22, 2014, at 1:59 AM, Jukka-Pekka Kekkonen wrote:
> Hi,
>
> I am trying to get the Hyd
Hmmm...that's not a "bug", but just a packaging issue with the way CentOS
distributed some variants of OMPI that requires you install/update things in a
specific order.
On Jul 20, 2014, at 11:34 PM, Lane, William wrote:
> Please see:
>
> http://bugs.centos.org/view.php?id=5812
>
> From: user
Can you try upgrading to OMPI 1.6.5? 1.6.5 has *many* bug fixes compared to
1.5.4.
A little background...
Open MPI is developed in terms of release version pairs:
"1.odd" are feature releases. We add new (and remove old) features, etc. We
do a lot of testing, but this is all done in lab/tes
Hmm, this does not make sense.
Your copy-n-paste shows that both machines (00 and 01) have the same guid/lid
(sort of equivalent of mac address in ethernet world).
As you can guess these two can not be identical for two different machines
(unless you moved the card around).
Best,
Pasha
On Jul 2
Sayed,
You might try this link (or have your sysadmin do it if you do not have
admin privileges.) To me it looks like your second port is in the "INIT"
state but has not been added by the subnet manager.
https://software.intel.com/en-us/articles/troubleshooting-infiniband-connection-issues-using-
Hi,
I am trying to get the Hydra process manager (that is typically associated with
mpich2) to launch Open MPI applications. Has someone here managed to do this or
is it even possible?
/ JP
And where I can find run/job/submission ?
On Mon, Jul 21, 2014 at 6:57 PM, Shamis, Pavel wrote:
>
> You have to check the ports states on *all* nodes in the
> run/job/submission. Checking on a single node is not enough.
> My guess is the 01-00 tries to connect 01-01 and the ports are down on
> 0
14 matches
Mail list logo