ed the OSU benchmarks and tried osu_latency It's report ~40
microsecs for OpenMPI, and ~3 micrcosecs for MVAPICH. Still puzzled...
Steve
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Pavel Shamis (Pasha)
Sent: Thursday, February
Hey,
I only may to add the XRC and RC have the same latency.
What is the command line that you use to run this benchmark ?
What is the system configuration (one hca, one active port ) ?
Any addition information about system configuration, mpi command line,
etc. will help to analyze your issue.
Very strange. MPI tries to access CQ context and it get immediate error.
Please make sure that you limits configuration is ok, take a look on
this FAQ - http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Pasha.
Charles Wright wrote:
Hello,
I just got some new cluster
You will not be need the trick if you will configure Open Mpi with
follow flag:
--enable-mpirun-prefix-by-default
Pasha.
Hodgess, Erin wrote:
the LD_LIBRARY_PATH did the trick;
thanks so much!
Sincerely,
Erin
Erin M. Hodgess, PhD
Associate Professor
Department of Computer and
Sangamesh,
The ib tunings that you added to your command line only delay the
problem but doesn't resolve it.
The node-0-2.local gets asynchronous event "IBV_EVENT_PORT_ERROR" and as
result
the processes fails to deliver packets to some remote hosts and as
result you see bunch of IB errors.
You may try to use ibdiagnet tool:
http://linux.die.net/man/1/ibdiagnet
The tool is part of OFED (http://www.openfabrics.org/)
Pasha.
Prentice Bisbal wrote:
Several jobs on my cluster just died with the error below.
Are there any IB/Open MPI diagnostics I should use to diagnose, should I
If the above doesn't improve anything the next question is do you know
what the sizes of the messages are? For very small messages I believe
Scali shows a 2x better performance than Intel and OMPI (I think this
is due to a fastpath optimization).
I remember that mvapich was faster that
the following MPI functions being used:
MPI_Init
MPI_wtime
MPI_COMM_RANK
MPI_COMM_SIZE
MPI_BUFFER_ATTACH
MPI_BSEND
MPI_PACK
MPI_UNPACK
MPI_PROBE
MPI_GET_COUNT
MPI_RECV
MPI_IPROBE
MPI_FINALIZE
where MPI_IPROBE is the clear winner in terms of number of calls.
/Torgny
Pavel Shamis (Pasha) wrote
Do you know if the application use some collective operations ?
Thanks
Pasha
Torgny Faxen wrote:
Hello,
we are seeing a large difference in performance for some applications
depending on what MPI is being used.
Attached are performance numbers and oprofile output (first 30 lines)
from one
We have a computational cluster which is consisting of 8 HP Proliant
ML370G5 with 32GB ram.
Each node has a Melanox single port infiniband DDR HCA card (20Gbit/s)
and connected each other through
a Voltaire ISR9024D-M DDR infiniband switch.
Now we want to increase the bandwidth to 40GBit/s
Hi,
You can select ib device used with openib btl by using follow parametres:
MCA btl: parameter "btl_openib_if_include" (current value: , data
source: default value)
Comma-delimited list of devices/ports to be
used (e.g. "mthca0,mthca1:2"; empty value means to
Jim,
Can you please share with us you mca conf file.
Pasha.
Jim Kress ORG wrote:
For the app I am using, ORCA (a Quantum Chemistry program), when it was
compiled using openMPI 1.2.8 and run under 1.2.8 with the following in
the openmpi-mca-params.conf file:
btl=self,openib
the app ran fine
I tried to run with the first dynamic rules file that Pavel proposed
and it works, the time per one MD step on 48 cores decreased from 2.8
s to 1.8 s as expected.
Good news :-)
Pasha.
Thanks
Roman
On Wed, May 20, 2009 at 7:18 PM, Pavel Shamis (Pasha) <pash...@gmail.com>
Tomorrow I will add some printf to collective code and check what really
happens there...
Pasha
Peter Kjellstrom wrote:
On Wednesday 20 May 2009, Pavel Shamis (Pasha) wrote:
Disabling basic_linear seems like a good idea but your config file sets
the cut-off at 128 Bytes for 64-ranks
Disabling basic_linear seems like a good idea but your config file sets the
cut-off at 128 Bytes for 64-ranks (the field you set to 8192 seems to result
in a message size of that value divided by the number of ranks).
In my testing bruck seems to win clearly (at least for 64 ranks on my IB)
The correct MCA parameters are the following:
-mca coll_tuned_use_dynamic_rules 1
-mca coll_tuned_dynamic_rules_filename ./dyn_rules
Ohh..it was my mistake
You can also run the following command:
ompi_info -mca coll_tuned_use_dynamic_rules 1 -param coll tuned
This will give some insight
Default algorithm thresholds in mvapich are different from ompi.
Using tunned collectives in Open MPI you may configure the Open MPI
Alltoall threshold as Mvapich defaults.
The follow mca parameters configure Open MPI to use custom rules that
are defined in configure(txt) file.
"--mca
.
Pasha.
Roman Martonak wrote:
I've been using --mca mpi_paffinity_alone 1 in all simulations. Concerning "-mca
mpi_leave_pinned 1", I tried it with openmpi 1.2.X versions and it
makes no difference.
Best regards
Roman
On Mon, May 18, 2009 at 4:57 PM, Pavel Shamis (Pasha) <pash
1) I was told to add "-mca mpi_leave_pinned 0" to avoid problems with
Infinband. This was with OpenMPI 1.3.1. Not
Actually for 1.2.X version I will recommend you to enable leave pinned
"-mca mpi_leave_pinned 1"
sure if the problems were fixed on 1.3.2, but I am hanging on to that
setting
The (low level verbs) latency has AFAIR changed only a few times:
1) started at 5-6us with PCI-X Infinihost3
2) dropped to 3-4us with PCI-express Infinihost3
3) dropped to ~1us with PCI-express ConnectX
I would like to add that on PCI-EX Gen2 platforms the latency is sub
micro (~0.8-0.95)
I can't find a similar data set for Infiniband. I would appreciate any
comment/links.
Here is IB roadmap http://www.infinibandta.org/itinfo/IB_roadmap
...But I do not see there SDR
Pasha
Jan,
I guess that you have OFED driver installed on you machines. You may do
basic network verification with ibdiagnet utility
(http://linux.die.net/man/1/ibdiagnet) that is part of OFED installation.
Regards,
Pasha
Jeff Squyres wrote:
On May 4, 2009, at 9:50 AM, jan wrote:
Thank you
You may try to use XRC, it should decrease openib btl memory footprint,
especially on multi-core system, like you have. The follow command will
switch default OMPI config to XRC:
" --mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32"
Do you have the same HCA adapter type on all of your machines ?
In the error log I see mlx4 error message , and mlx4 is connectX driver,
but ibv_devinfo show some older hca.
Pasha
Jeff Layton wrote:
Pasha,
Here you go... :) Thanks for looking at this.
Jeff
hca_id: mthca0
fw_ver:
Thanks Pasha!
ibdiagnet reports the following:
-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Port localhost/P1 lid=0x00e2
Usually "retry exceeded error" points to some network issues, like bad
cable or some bad connector. You may use ibdiagnet tool for the network
debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED.
Pasha
Brett Pemberton wrote:
Hey,
I've had a couple of errors recently,
You may specify:
--mca btl openib,sm,self
Application sometime runs fast, sometimes runs slow
When you specify the parameter above, open mpi will use only three btls
openib - for Infiniband
sm - for shared memory communication
self - for "self" communication
NO other btl will be used.
And
Another thing to try is a change that we made late in the Open MPI
v1.2 series with regards to IB:
http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
Thanks, this is something worth investigating. What would be the exact
syntax to use to turn off
If the basic test run the installation is ok. So what happens when you
try to run your application ? What is command line ? What is the error
message ? do you run the application on the same set of machines with
the same command line as IMB ?
Pasha
yes to both questions: the OMPI version
Teige, Scott W wrote:
Greetings,
I have observed strange behavior with an application running with
OpenMPI 1.2.8, OFED 1.2. The application runs in two "modes", fast
and slow. The exectution time is either within one second of 108 sec.
or within one second of 67 sec. My cluster has 1 Gig
Biagio Lucini wrote:
Hello,
I am new to this list, where I hope to find a solution for a problem
that I have been having for quite a longtime.
I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
with Infiniband interconnects that I use and administer at the same
time. The
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.
BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be
released in end of
rs mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.
Hi,
Can you please provide more information about your setup:
- OpenMPI version
- Runtime tuning
- Platform
- IB vendor and driver version
Thanks,
Pasha
Åke Sandgren wrote:
Hi!
We have a code that (at least sometimes) gets the following error
message:
The "RETRY EXCEEDED ERROR" error is related to IB and not MTT.
The error says that IB failed to send IB packet from
machine 10.2.1.90 to 10.2.1.50
You need to run your IB network monitoring tool and found the issue.
Usually it is some bad cable in IB fabric that causes such errors.
Regards,
Amir Saad wrote:
I'll be starting some parallel programs in Open MPI and I would like
to find a guide or any docs of Open MPI, any suggestions please? I
couldn't find any docs on the website, how do I know about the APIs or
the functions that I should use?
Here are videos about OpenMPI/MPI -
this issue ?
Regards.
Pasha
Ethan Mallove wrote:
On Wed, May/21/2008 09:53:11PM, Pavel Shamis (Pasha) wrote:
Oops, in the "MTT server side problem" we discussed other issue.
But anyway I did not see the problem on my server after the upgrade :)
We took *some* steps to alle
with Intel and Pgi compilers:
http://www.mellanox.com/products/ofed.php
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote:
On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote:
In trying to b
Is there a way to shut off early completion in 1.2.3?
Sure, just add "--mca |pml_ob1_use_early_completion 0" to your command
line.| ||
Or the the above a known issues and i should use 1.2.7-pre or grab a
1.3 snap shot?
1.2.6 should be ok.
Regards,
Pasha
On Jul 2, 2008, at 10:42 AM, Pa
May be this FAQ will help :
http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
Brock Palen wrote:
We have a code (arts) that locks up only when running on IB. Works
fine on tcp and sm.
When we ran it in a debugger. It locked up on a MPI_Comm_split()
That as far
/19/08, Pavel Shamis (Pasha)
/<pa...@dev.mellanox.co.il>/* wrote:
From: Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il>
Subject: Re: [OMPI users] Open MPI timeout problems.
To: pj...@cornell.edu, "Open MPI Users" <us...@open-mpi.org>
Date: Thur
Usually the retry exceed point to some network issue on your cluster. I
see from the logs that you still
use MVAPI. If i remember correct, MVAPI include IBADM application that
should be able to check and debug the network.
BTW I recommend you to update your MVAPI driver to latest OpenFabric
Scott Shaw wrote:
Hi, I hope this is the right forum for my questions. I am running into
a problem when scaling >512 cores on a infiniband cluster which has
14,336 cores. I am new to openmpi and trying to figure out the right
-mca options to pass to avoid the "mca_oob_tcp_peer_complete_connect:
Oops, in the "MTT server side problem" we discussed other issue.
But anyway I did not see the problem on my server after the upgrade :)
Pasha
Pavel Shamis (Pasha) wrote:
I had similar problem on my server. I upgraded the server to latest
trunk and the problem disappear.
(see "
/or apache).
On May 21, 2008, at 2:36 PM, Ethan Mallove wrote:
On Wed, May/21/2008 06:46:06PM, Pavel Shamis (Pasha) wrote:
I sent it directly to your email. Please check.
Thanks,
Pasha
Got it. Thanks. It's a PHP memory overload issue.
(Apparently I didn't look far back enough in the httpd
error_
Jeff Squyres wrote:
Are we running into http max memory problems or http max upload size
problems again?
I guess it is some server side issue, you need to check the
/var/log/httpd/* log on the server.
On May 21, 2008, at 5:28 AM, Pavel Shamis (Pasha) wrote:
Hi,
Here is test result from
, Pavel Shamis (Pasha) wrote:
Hello,
Did you have chance to review this patch ?
Regards,
Pasha
Josh Hursey wrote:
Sorry for the delay on this. I probably will not have a chance to
look at it until later this week or early next. Thank you for the
work on the patch.
Cheers,
Josh
On May 12
that we should unify the functionality I cannot
recommend this patch since it will result in losing useful error
handling functionality. Maybe there is another way to clean this up
to preserve the error reporting.
-- Josh
On May 7, 2008, at 11:56 AM, Pavel Shamis (Pasha) wrote:
Hi Josh,
I had
the functionality I cannot
recommend this patch since it will result in losing useful error
handling functionality. Maybe there is another way to clean this up to
preserve the error reporting.
-- Josh
On May 7, 2008, at 11:56 AM, Pavel Shamis (Pasha) wrote:
Hi Josh,
I had the original
:
On Tue, May/06/2008 06:29:33PM, Pavel Shamis (Pasha) wrote:
I'm not sure which cron jobs you're referring to. Do you
mean these?
https://svn.open-mpi.org/trac/mtt/browser/trunk/server/php/cron
I talked about this one:
https://svn.open-mpi.org/trac/mtt/wiki/ServerMaintenance
have the latest mtt/server scripts?
https://svn.open-mpi.org/trac/mtt/changeset/1119/trunk/server/php/submit
-Ethan
On Tue, May/06/2008 03:26:43PM, Pavel Shamis (Pasha) wrote:
About the issue:
1. On client side I see ""*** WARNING: MTTDatabase client did not get a
serial&
navailable
My memory limit in php.ini file was set on 256MB !
Any ideas ?
Thanks.
--
Pavel Shamis (Pasha)
Mellanox Technologies
___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
My memory limit in php.ini file was set on 256MB !
Any ideas ?
Thanks.
--
Pavel Shamis (Pasha)
Mellanox Technologies
1 5 123391
Pavel Shamis (Pasha) wrote:
SLIM H.A. wrote:
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
The ibstat command gives information like this:
Port 1:
State: Active
Physical state: LinkUp
using an infiniband port or
comunicates through plain ethernet.
I would be grateful for any advice
You have access to some counters in
/sys/class/infiniband/mlx4_0/ports/1/counters/ (counters for hca -
mlx4_0 , port 1)
--
Pavel Shamis (Pasha)
Mellanox Technologies
second one will be reserver for back-up.
On network failure on the first port
all connections will migrate to second port. The APM works only on the
HCA level - I mean that you can not migrate between
different HCAs, you can migrate only between 2 ports of the same HCA.
--
Pavel Shamis (Pasha)
Mellanox Technologies
mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Pavel Shamis (Pasha)
Mellanox Technologies
Ok, I will do.
Jeff Squyres wrote:
Sure, that would be fine.
Can you write it up in a little more FAQ-ish style? I can add it to
the web page. See this wiki item:
https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries
On Mar 12, 2008, at 5:33 AM, Pavel Shamis (Pasha) wrote:
Run
t I cannot seem to find anywhere that will tell me how to
change the GID to something else.
Thanks,
Jon
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Pavel Shamis (Pasha)
Mellanox Technologies
I found the problem it was a typo in name of variable. I had
something like :
email_subject: MPI regression $broken_name
After fixing the name I started to get reports !
Thanks.
Pasha
Pavel Shamis (Pasha) wrote:
I might've misread your last email. Did the new
email_subject INI
rg/mailman/listinfo.cgi/mtt-users
--
Pavel Shamis (Pasha)
Mellanox Technologies
2008
Thanks.
--
Pavel Shamis (Pasha)
Mellanox Technologies
:)
Regards,
Pavel Shamis (Pasha)
Adams Samuel D Contr AFRL/HEDR wrote:
I set bash to have unlimited size core files like this:
$ ulimit -c unlimited
But, it was not dropping core files for some reason when I was running with
mpirun. Just to make sure it would do what I expected, I wrot
63 matches
Mail list logo