> On May 29, 2018, at 10:30 PM, Kaiming Ouyang wrote:
>
> I have a question about recompiling openmpi.
> Recently I updated the infiniband driver for network card Mellanox, but I
> found original openmpi did not work anymore. Does this mean the driver update
> must be followed by recompiling op
Hi All,
I have a question about recompiling openmpi.
Recently I updated the infiniband driver for network card Mellanox, but I
found original openmpi did not work anymore. Does this mean the driver
update must be followed by recompiling openmpi? Or there are some other
issues I should consider? Tha
Dear Yann
Here is the output
*[root@compute-01-01 ~]# cat /etc/redhat-release*
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
*[root@compute-01-01 ~]# uname -a*
Linux compute-01-01.private.dns.zone 2.6.18-128.el5 #1 SMP Wed Dec 17
11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
*[root@com
Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit :
> Dear John
>
> I found this output of ibstatus on some nodes (most probably the
> problem causing)
> [root@compute-01-08 ~]# ibstatus
>
> Fatal error: device '*': sys files not found
> (/sys/class/infiniband/*/ports)
>
> Do
Seems like driver was not started. I would suggest to run lspci and check if
the HCA is visible on HW level.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote:
Dear Joh
Dear John
I found this output of ibstatus on some nodes (most probably the problem
causing)
[root@compute-01-08 ~]# ibstatus
Fatal error: device '*': sys files not found
(/sys/class/infiniband/*/ports)
Does this show any hardware or software issue?
Thanks
On Wed, Nov 28, 2012 at 3:17 PM, Jo
I am not sure about drivers because those were installed by someone else
during cluster setup. I see following information about infiniband card.
The card is DDR InfiniBand Mellanox ConnectX.
On Wed, Nov 28, 2012 at 3:17 PM, John Hearns wrote:
> Those diagnostics are from Openfabrics.
> What ty
Those diagnostics are from Openfabrics.
What type of infiniband card do you have?
What drivers are you using?
ibstats comes with some other distribution? I don't have this command
available right now
On Wed, Nov 28, 2012 at 1:14 PM, John Hearns wrote:
> Short answer. Run ibstats or ibstatus.
> Look also at the logs of your subnet manager.
>
> ___
> users mail
Short answer. Run ibstats or ibstatus.
Look also at the logs of your subnet manager.
Dear All
I have an application which is run using openmpi and uses infiniband flags.
The application is a forecast model simulation. A frequent problem arises
that the Infiniband mezzanine cards of servers become faulty (don't know
the reason why it happens so frequent), the model simulation becom
Cc: OpenMPI Users
Sent: Monday, 10 September 2012 9:11 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
So what you saying in short, leaving all the numbers aside, is the following:
In your particular application on your particular setup with this particular
Yevgeny Kliteynik
> *To:* Randolph Pullen
> *Cc:* OpenMPI Users
> *Sent:* Sunday, 9 September 2012 6:18 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolph,
>
> O
See my comments in line...
From: Yevgeny Kliteynik
To: Randolph Pullen
Cc: OpenMPI Users
Sent: Sunday, 9 September 2012 6:18 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote:
> Yevgeny,
> The ibstat results:
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
What you have is InfiniHost III HCA, which is 4x SDR card.
This card has theoretical peak of 10 Gb/s, which is 1GB/s in IB bit coding.
> And more interest
: [OMPI users] Infiniband performance Problem and stalling
On 9/3/2012 4:14 AM, Randolph Pullen wrote:
> No RoCE, Just native IB with TCP over the top.
Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
Could you run "ibstat" and post the results?
2012 6:03 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
On 9/3/2012 4:14 AM, Randolph Pullen wrote:
> No RoCE, Just native IB with TCP over the top.
Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
Could you run "ibstat&qu
---
> *From:* Yevgeny Kliteynik
> *To:* Randolph Pullen ; Open MPI Users
>
> *Sent:* Sunday, 2 September 2012 10:54 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolph,
>
> Some clarification on the setup:
>
> &q
ay, 2 September 2012 10:54 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G
cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Also, have you had a chance to try some newer OMPI release?
Any 1.6.x would do.
-- YK
On 8/31/2012 10:53 AM,
(reposted with consolidated information)
I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III
HCA 10G cards
running Centos 5.7 Kernel 2.6.18-274
Open MPI 1.4.3
MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
On a Cisco 24 pt switch
Normal performance is:
$ mpirun --mca btl openi
- On occasions it seems to stall indefinately, waiting on a single receive.
Any ideas appreciated.
Thanks in advance,
Randolph
From: Randolph Pullen
To: Paul Kapinos ; Open MPI Users
Sent: Thursday, 30 August 2012 11:46 AM
Subject: Re: [OMPI users] Infin
64K and force short messages. Then the openib times are
the same as TCP and no faster.
I'ms till at a loss as to why...
From: Paul Kapinos
To: Randolph Pullen ; Open MPI Users
Sent: Tuesday, 28 August 2012 6:13 PM
Subject: Re: [OMPI users] Infin
Randolph,
after reading this:
On 08/28/12 04:26, Randolph Pullen wrote:
- On occasions it seems to stall indefinately, waiting on a single receive.
... I would make a blind guess: are you aware about IB card parameters for
registered memory?
http://www.open-mpi.org/faq/?category=openfabrics#
I have a test rig comprising 2 i7 systems with Melanox III HCA 10G cards
running Centos 5.7 Kernel 2.6.18-274
Open MPI 1.4.3
MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
On a Cisco 24 pt switch
Normal performance is:
$ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong
results
On Jul 31, 2012, at 12:14 AM, Joen Chen wrote:
> After reading the FAQ about OFED, I knew that openMPI can collaborate with
> RoCE.
Correct -- Open MPI can use RoCE interfaces, if they are available.
> Moreover, using the RoCE make some overhead because the underlying network
> layers. In my i
Hi every one!
After reading the FAQ about OFED, I knew that openMPI can collaborate with
RoCE. Moreover, using the RoCE make some overhead because the underlying
network layers. In my infiniband bandwidth testing, I get the 5Gbps using
IPoIB and 12Gbps using RDMA. The performance gap is huge for m
Jeremy,
As far as I understand the tool that Evgeny recommended showed that the remote
port is reachable.
Based on the log that have been provided I can't find the issue in ompi,
everything seems to be kosher.
Unfortunately, I do not have a platform where I may try to reproduce the issue.
I wo
Hi Pasha,
I just wanted to check if you had any further suggestions regarding
the APM issue based on the updated info in my previous email.
Thanks,
-Jeremy
On Mon, Mar 12, 2012 at 12:43 PM, Jeremy wrote:
> Hi Pasha, Yevgeny,
>
>>> My educated guess is that from some reason it is no direct conn
Hi Pasha, Yevgeny,
>> My educated guess is that from some reason it is no direct connection path
>> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
>> information.
> If you don't get response or you get info of
> the device different that what you would expect,
> then
Hi,
I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.
If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:
smpquery --Ca H
On Thu, Mar 8, 2012 at 10:44 AM, Shamis, Pavel wrote:
> Jeremy,
> Finally I had a chance to look at log file.
Hi Pasha,
I appreciate the review you did and the comments you provided. I will
see if we can get some additional routing information. I will also do
some experiments with a more trivi
Jeremy,
Finally I had a chance to look at log file.
Initially all qps are created on port 1, and in the same time alternative path
loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1,
APM even is reported because the alternative path is active now, and from some
reason
Hi Pasha,
>On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel wrote:
>
> I would like to see all the file.
> 28MB is it the size after compression ?
>
> I think gmail supports up to 25Mb.
> You may try to create gzip file and then slice it using "split" command.
See attached. At about line 151311 i
>
>> On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote:
>> I reviewed the code and it seems to be ok :) The error should be reported if
>> the port migration is already happened once (port 1 to port 2), and now you
>> are trying to shutdown port 2 and MPI reports that it can't migrate anymo
Hi Pasha,
>On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote:
> I reviewed the code and it seems to be ok :) The error should be reported if
> the port migration is already happened once (port 1 to port 2), and now you
> are trying to shutdown port 2 and MPI reports that it can't migrate an
Jeremy,
I reviewed the code and it seems to be ok :) The error should be reported if
the port migration is already happened once (port 1 to port 2), and now you are
trying to shutdown port 2 and MPI reports that it can't migrate anymore. It
assumes that port 1 is still down and it can't go back
Hi Pasha,
Thanks for your response. I look forward to hearing from you when you
have a chance.
-Jeremy
On Wed, Feb 22, 2012 at 10:43 PM, Shamis, Pavel wrote:
> Jeremy,
> I implemented the APM support for openib btl a long time ago. I do not
> remember all the details of the implementation, but
Jeremy,
I implemented the APM support for openib btl a long time ago. I do not remember
all the details of the implementation, but I remember that it is used to
support LMC bits and multiple ib ports. Unfortunately I'm super busy this week.
I will try look at it early next week.
Pavel (Pasha) S
Hi,
I am have a problem getting Alternative Path Migration (APM) to work
over the InfiniBand ports on my HCA.
Details on my configuration and the issue I have are below. Please
let me know if you can provide any suggestions or corrections to my
configuration? I will be happy to try other experi
This means that you have some problem on that node,
and it's probably unrelated to Open MPI.
Bad cable? Bad port? FW/driver in some bad state?
Do other IB performance tests work OK on this node?
Try rebooting the node.
-- YK
On 12-Sep-11 7:52 AM, Ahsan Ali wrote:
> Hello all
>
> I am getting fol
Hello all
I am getting following error during an application run which causes it to
crash.
*[[36944,1],41][btl_openib_component.c:3227:handle_wc] from
compute-01-19.private.dns.zone to: compute-01-04 error polling LP CQ with
status RETRY EXCEEDED ERROR status number 12 for wr_id 167703304 opcode
Yevgeny,
Sorry for the delay in replying -- I'd been out for a few days.
- Original Message -
> From: Yevgeny Kliteynik
> Sent: Thursday, July 14, 2011 12:51 AM
> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
> While I'm try
On 11-Jul-11 5:23 PM, Bill Johnstone wrote:
> Hi Yevgeny and list,
>
> - Original Message -
>
>> From: Yevgeny Kliteynik
>
>> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
>
> Thank you.
That's interesting...
This MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thingy imp
Hi Yevgeny and list,
- Original Message -
> From: Yevgeny Kliteynik
> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
Thank you.
> One question though, just to make sure we're on the same page: so the jobs
> do run OK on
> the older HCAs, as long as they
Hi Bill,
On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
>
>
>
> - Original Message -
>> From: Jeff Squyres
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport
Hello, and thanks for the reply.
- Original Message -
> From: Jeff Squyres
> Sent: Thursday, July 7, 2011 5:14 PM
> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>
> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
>
>> I have
On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
> I have a heterogeneous network of InfiniBand-equipped hosts which are all
> connected to the same backbone switch, an older SDR 10 Gb/s unit.
>
> One set of nodes uses the Mellanox "ib_mthca" driver, while the other uses
> the "mlx4" driver.
Hello all.
I have a heterogeneous network of InfiniBand-equipped hosts which are all
connected to the same backbone switch, an older SDR 10 Gb/s unit.
One set of nodes uses the Mellanox "ib_mthca" driver, while the other uses the
"mlx4" driver.
This is on Linux 2.6.32, with Open MPI 1.5.3 .
On Friday 19 November 2010 01:03:35 HeeJin Kim wrote:
...
> * mlx4: There is a mismatch between the kernel and the userspace
> libraries: Kernel does not support XRC. Exiting.*
...
> What I'm thinking is that the infiniband card is installed but it doesn't
> work in correct mode.
> My linux kerne
On Nov 18, 2010, at 7:03 PM, HeeJin Kim wrote:
> I'm using Mellanox infiniband network card and trying to run it with openmpi.
> The problem is that I can connect and communicate between nodes, but I'm not
> sure whether it is in a correct state or not.
>
> I have two version of openmpi, one is
Dear,
I'm using Mellanox infiniband network card and trying to run it with
openmpi.
The problem is that I can connect and communicate between nodes, but I'm not
sure whether it is in a correct state or not.
I have two version of openmpi, one is compiled with mca-btl-openib and the
other is withou
It would be best if an IB vendor replies (hint hint!), but it is likely that
you have some kind of hardware issue on that node (e.g., a bad / flakey HCA,
etc.). You should probably run a full set of layer-0 diagnostics on your
fabric to make sure it's clean.
I say this because back when Cisco
Dear all,
I would like to ask for help with understanding an error message I get
when communication using Open MPI 1.4.1 over Infiniband fails. After
several hours of operation, communication with one particular node
(f24) fails with something like:
[[20265,1],79][btl_openib_component.c:2951:hand
Yep -- it's normal.
Those IP addresses are used for bootstrapping/startup, not for MPI traffic. In
particular, that "HNP URI" stuff is used by Open MPI's underlying run-time
environment. It's not used by the MPI layer at all.
On Feb 5, 2010, at 2:32 PM, Mike Hanby wrote:
> Howdy,
>
> When
Howdy,
When running a Gromacs job using OpenMPI 1.4.1 on Infiniband enabled nodes, I'm
seeing the following process listing:
\_ -bash /opt/gridengine/default/spool/compute-0-3/job_scripts/97037
\_ mpirun -np 4 mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o
production-Npt-323K_4CPU -c pro
Correct, you don't need DAPL. Can you send all the information listed
here:
http://www.open-mpi.org/community/help/
On Sep 17, 2009, at 9:17 AM, Yann JOBIC wrote:
Hi,
I'm new to infiniband.
I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules.
When i'm running a ring test o
Hi,
I'm new to infiniband.
I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules.
When i'm running a ring test openmpi code, i've got :
[Lidia][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query]
Set MTU to IBV value 4 (2048 bytes)
[Lidia][0,1,1][btl_openib_endpoint
Gus Correa wrote:
> Hi Jim, list
>
> 1) Your first question:
>
> I opened a thread on this list two months or so ago about a similar
> situation: when OpenMPI would use/not use libnuma.
> I asked a question very similar to your question about IB support,
> and how the configure script would provi
On Jun 25, 2009, at 12:53 PM, Jim Kress wrote:
Is it correct to assume that, when one is configuring openmpi v1.3.2
and if
one leaves out the
--with-openib=/dir
from the ./configure command line, that InfiniBand support will NOT
be built
into openmpi v1.3.2? Then, if an Ethernet network i
Hi Jim, list
1) Your first question:
I opened a thread on this list two months or so ago about a similar
situation: when OpenMPI would use/not use libnuma.
I asked a question very similar to your question about IB support,
and how the configure script would provide it or not.
Jeff answerer it, a
Is it correct to assume that, when one is configuring openmpi v1.3.2 and if
one leaves out the
--with-openib=/dir
from the ./configure command line, that InfiniBand support will NOT be built
into openmpi v1.3.2? Then, if an Ethernet network is present that connects
all the nodes, openmpi will us
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.
BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be
released in end of th
On Nov 20, 2008, at 4:16 PM, Michael Oevermann wrote:
with a blank behind /machine. Anyway, your suggested options -mca
btl openib,sm,self
did help!!!
The specific tip here is that on Linux, you want to use the openib
BTL, not the udapl BTL. Specifying "--mca btl openib,sm,self" means
t
BTW - after you get more comfortable with your new-to-you cluster, I
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.
On Thu, Nov 20, 2008 at 3:16
Hi Ralph,
that was indeed a typo, the command is of course
/usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
/home/sysgen/infiniband-mpi-test/machine
/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
with a blank behind /machine. Anyway, your suggested options -mca btl
openi
Your command line may have just come across with a typo, but something
isn't right:
-hostfile /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/
openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
That looks more like a path to a binary than a path to a hostfile. Is
there a missing space or filenam
Hi all,
I have "inherited" a small cluster with a head node and four compute
nodes which I have to administer. The nodes are connected via infiniband (OFED), but the head is not.
I am a complete novice to the infiniband stuff and here is my problem:
The infiniband configuration seems to be OK
Another nice tools for ib monitoring.
1. perfquery (part of OFED), example of report:
Port counters: Lid 12 port 1
PortSelect:..1
CounterSelect:...0x
SymbolErrors:7836
LinkRecovers:255
LinkDowned:...
SLIM H.A. wrote:
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
The ibstat command gives information like this:
Port 1:
State: Active
Physical state: LinkUp
but does not say whether a job is actually usin
Open MPI does not register with HCAs / ports in a way visible through
OFED command line tools, sorry...
On Apr 27, 2008, at 11:19 AM, SLIM H.A. wrote:
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
Th
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
The ibstat command gives information like this:
Port 1:
State: Active
Physical state: LinkUp
but does not say whether a job is actually using an infiniband po
than
cheep gig-e).
Thanks again.
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Wednesday, December 20, 2006 10:01 PM
To: Jeff Squyres
Cc: Open MPI Users
Subject: Re: [OMPI users] Infiniband - Any suggestions on &qu
On Dec 20, 2006, at 7:04 PM, Jeff Squyres wrote:
I've been asked by the owner of the cluster "How can you prove to me
that this openmpi job is using the Infiniband network?"
At first I thought a simple netstat -an on the compute nodes might
tell
me, however I don't see the Infiniband IP's in
You can also usually watch the counters on the IB cards and
Ethernet cards. For programs that have a lot of communication
between nodes it is quickly obvious which network you're using.
The IB card monitoring is driver specific, but you should have
some tools for this. For Ethernet you can
On Dec 20, 2006, at 6:28 PM, Michael John Hanby wrote:
Howdy, I'm new to cluster administration, MPI and high speed networks.
I've compiled my OpenMPI using these settings:
./configure CC='icc' CXX='icpc' FC='ifort' F77='ifort'
--with-mvapi=/usr/local/topspin
--with-mvapi-libdir=/usr/local/top
Howdy, I'm new to cluster administration, MPI and high speed networks.
I've compiled my OpenMPI using these settings:
./configure CC='icc' CXX='icpc' FC='ifort' F77='ifort'
--with-mvapi=/usr/local/topspin
--with-mvapi-libdir=/usr/local/topspin/lib64 --enable-static
--prefix=/share/apps/openmpi/1.
77 matches
Mail list logo