Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-18 Thread Pavel Shamis (Pasha)
ed the OSU benchmarks and tried osu_latency It's report ~40 microsecs for OpenMPI, and ~3 micrcosecs for MVAPICH. Still puzzled... Steve -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Pavel Shamis (Pasha) Sent: Thursday, February

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-18 Thread Pavel Shamis (Pasha)
Hey, I only may to add the XRC and RC have the same latency. What is the command line that you use to run this benchmark ? What is the system configuration (one hca, one active port ) ? Any addition information about system configuration, mpi command line, etc. will help to analyze your issue.

Re: [OMPI users] [btl_openib_component.c:1373:btl_openib_component_progress] error polling HP CQ with -2 errno says Success

2009-09-26 Thread Pavel Shamis (Pasha)
Very strange. MPI tries to access CQ context and it get immediate error. Please make sure that you limits configuration is ok, take a look on this FAQ - http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Pasha. Charles Wright wrote: Hello, I just got some new cluster

Re: [OMPI users] running open mpi on ubuntu 9.04

2009-09-21 Thread Pavel Shamis (Pasha)
You will not be need the trick if you will configure Open Mpi with follow flag: --enable-mpirun-prefix-by-default Pasha. Hodgess, Erin wrote: the LD_LIBRARY_PATH did the trick; thanks so much! Sincerely, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and

Re: [OMPI users] Job fails after hours of running on a specific node

2009-09-21 Thread Pavel Shamis (Pasha)
Sangamesh, The ib tunings that you added to your command line only delay the problem but doesn't resolve it. The node-0-2.local gets asynchronous event "IBV_EVENT_PORT_ERROR" and as result the processes fails to deliver packets to some remote hosts and as result you see bunch of IB errors.

Re: [OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Pavel Shamis (Pasha)
You may try to use ibdiagnet tool: http://linux.die.net/man/1/ibdiagnet The tool is part of OFED (http://www.openfabrics.org/) Pasha. Prentice Bisbal wrote: Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
If the above doesn't improve anything the next question is do you know what the sizes of the messages are? For very small messages I believe Scali shows a 2x better performance than Intel and OMPI (I think this is due to a fastpath optimization). I remember that mvapich was faster that

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
the following MPI functions being used: MPI_Init MPI_wtime MPI_COMM_RANK MPI_COMM_SIZE MPI_BUFFER_ATTACH MPI_BSEND MPI_PACK MPI_UNPACK MPI_PROBE MPI_GET_COUNT MPI_RECV MPI_IPROBE MPI_FINALIZE where MPI_IPROBE is the clear winner in terms of number of calls. /Torgny Pavel Shamis (Pasha) wrote

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
Do you know if the application use some collective operations ? Thanks Pasha Torgny Faxen wrote: Hello, we are seeing a large difference in performance for some applications depending on what MPI is being used. Attached are performance numbers and oprofile output (first 30 lines) from one

Re: [OMPI users] Using dual infiniband HCA cards

2009-07-30 Thread Pavel Shamis (Pasha)
We have a computational cluster which is consisting of 8 HP Proliant ML370G5 with 32GB ram. Each node has a Melanox single port infiniband DDR HCA card (20Gbit/s) and connected each other through a Voltaire ISR9024D-M DDR infiniband switch. Now we want to increase the bandwidth to 40GBit/s

Re: [OMPI users] [OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-16 Thread Pavel Shamis (Pasha)
Hi, You can select ib device used with openib btl by using follow parametres: MCA btl: parameter "btl_openib_if_include" (current value: , data source: default value) Comma-delimited list of devices/ports to be used (e.g. "mthca0,mthca1:2"; empty value means to

Re: [OMPI users] 50% performance reduction due to OpenMPI v 1.3.2 forcing all MPI traffic over Ethernet instead of using Infiniband

2009-06-23 Thread Pavel Shamis (Pasha)
Jim, Can you please share with us you mca conf file. Pasha. Jim Kress ORG wrote: For the app I am using, ORCA (a Quantum Chemistry program), when it was compiled using openMPI 1.2.8 and run under 1.2.8 with the following in the openmpi-mca-params.conf file: btl=self,openib the app ran fine

Re: [OMPI users] scaling problem with openmpi

2009-05-21 Thread Pavel Shamis (Pasha)
I tried to run with the first dynamic rules file that Pavel proposed and it works, the time per one MD step on 48 cores decreased from 2.8 s to 1.8 s as expected. Good news :-) Pasha. Thanks Roman On Wed, May 20, 2009 at 7:18 PM, Pavel Shamis (Pasha) <pash...@gmail.com>

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Tomorrow I will add some printf to collective code and check what really happens there... Pasha Peter Kjellstrom wrote: On Wednesday 20 May 2009, Pavel Shamis (Pasha) wrote: Disabling basic_linear seems like a good idea but your config file sets the cut-off at 128 Bytes for 64-ranks

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Disabling basic_linear seems like a good idea but your config file sets the cut-off at 128 Bytes for 64-ranks (the field you set to 8192 seems to result in a message size of that value divided by the number of ranks). In my testing bruck seems to win clearly (at least for 64 ranks on my IB)

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
The correct MCA parameters are the following: -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_dynamic_rules_filename ./dyn_rules Ohh..it was my mistake You can also run the following command: ompi_info -mca coll_tuned_use_dynamic_rules 1 -param coll tuned This will give some insight

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Default algorithm thresholds in mvapich are different from ompi. Using tunned collectives in Open MPI you may configure the Open MPI Alltoall threshold as Mvapich defaults. The follow mca parameters configure Open MPI to use custom rules that are defined in configure(txt) file. "--mca

Re: [OMPI users] scaling problem with openmpi

2009-05-18 Thread Pavel Shamis (Pasha)
. Pasha. Roman Martonak wrote: I've been using --mca mpi_paffinity_alone 1 in all simulations. Concerning "-mca mpi_leave_pinned 1", I tried it with openmpi 1.2.X versions and it makes no difference. Best regards Roman On Mon, May 18, 2009 at 4:57 PM, Pavel Shamis (Pasha) <pash

Re: [OMPI users] scaling problem with openmpi

2009-05-18 Thread Pavel Shamis (Pasha)
1) I was told to add "-mca mpi_leave_pinned 0" to avoid problems with Infinband. This was with OpenMPI 1.3.1. Not Actually for 1.2.X version I will recommend you to enable leave pinned "-mca mpi_leave_pinned 1" sure if the problems were fixed on 1.3.2, but I am hanging on to that setting

Re: [OMPI users] Slightly off topic: Ethernet and InfiniBand speed evolution

2009-05-07 Thread Pavel Shamis (Pasha)
The (low level verbs) latency has AFAIR changed only a few times: 1) started at 5-6us with PCI-X Infinihost3 2) dropped to 3-4us with PCI-express Infinihost3 3) dropped to ~1us with PCI-express ConnectX I would like to add that on PCI-EX Gen2 platforms the latency is sub micro (~0.8-0.95)

Re: [OMPI users] Slightly off topic: Ethernet and InfiniBand speed evolution

2009-05-05 Thread Pavel Shamis (Pasha)
I can't find a similar data set for Infiniband. I would appreciate any comment/links. Here is IB roadmap http://www.infinibandta.org/itinfo/IB_roadmap ...But I do not see there SDR Pasha

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-05 Thread Pavel Shamis (Pasha)
Jan, I guess that you have OFED driver installed on you machines. You may do basic network verification with ibdiagnet utility (http://linux.die.net/man/1/ibdiagnet) that is part of OFED installation. Regards, Pasha Jeff Squyres wrote: On May 4, 2009, at 9:50 AM, jan wrote: Thank you

Re: [OMPI users] [Fwd: mpi alltoall memory requirement]

2009-04-26 Thread Pavel Shamis (Pasha)
You may try to use XRC, it should decrease openib btl memory footprint, especially on multi-core system, like you have. The follow command will switch default OMPI config to XRC: " --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32"

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Pavel Shamis (Pasha)
Do you have the same HCA adapter type on all of your machines ? In the error log I see mlx4 error message , and mlx4 is connectX driver, but ibv_devinfo show some older hca. Pasha Jeff Layton wrote: Pasha, Here you go... :) Thanks for looking at this. Jeff hca_id: mthca0 fw_ver:

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Pavel Shamis (Pasha)
Thanks Pasha! ibdiagnet reports the following: -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Port localhost/P1 lid=0x00e2

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Pavel Shamis (Pasha)
Usually "retry exceeded error" points to some network issues, like bad cable or some bad connector. You may use ibdiagnet tool for the network debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. Pasha Brett Pemberton wrote: Hey, I've had a couple of errors recently,

Re: [OMPI users] BTL question

2008-12-29 Thread Pavel Shamis (Pasha)
You may specify: --mca btl openib,sm,self Application sometime runs fast, sometimes runs slow When you specify the parameter above, open mpi will use only three btls openib - for Infiniband sm - for shared memory communication self - for "self" communication NO other btl will be used. And

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-28 Thread Pavel Shamis (Pasha)
Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to turn off

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
If the basic test run the installation is ok. So what happens when you try to run your application ? What is command line ? What is the error message ? do you run the application on the same set of machines with the same command line as IMB ? Pasha yes to both questions: the OMPI version

Re: [OMPI users] BTL question

2008-12-24 Thread Pavel Shamis (Pasha)
Teige, Scott W wrote: Greetings, I have observed strange behavior with an application running with OpenMPI 1.2.8, OFED 1.2. The application runs in two "modes", fast and slow. The exectution time is either within one second of 108 sec. or within one second of 67 sec. My cluster has 1 Gig

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The

Re: [OMPI users] infiniband problem

2008-11-23 Thread Pavel Shamis (Pasha)
recommend you upgrade your Open MPI installation. v1.2.8 has a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be available "next month"... so watch for an announcement on that front. BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be released in end of

Re: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
rs mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- -- Pavel Shamis (Pasha) Mellanox Technologies LTD.

Re: [OMPI users] Problem with btl_openib_endpoint_post_rr

2008-08-26 Thread Pavel Shamis (Pasha)
Hi, Can you please provide more information about your setup: - OpenMPI version - Runtime tuning - Platform - IB vendor and driver version Thanks, Pasha Åke Sandgren wrote: Hi! We have a code that (at least sometimes) gets the following error message:

Re: [MTT users] RETRY EXCEEDED ERROR

2008-07-31 Thread Pavel Shamis (Pasha)
The "RETRY EXCEEDED ERROR" error is related to IB and not MTT. The error says that IB failed to send IB packet from machine 10.2.1.90 to 10.2.1.50 You need to run your IB network monitoring tool and found the issue. Usually it is some bad cable in IB fabric that causes such errors. Regards,

Re: [OMPI users] How can I start building apps in Open MPI? any docs?

2008-07-27 Thread Pavel Shamis (Pasha)
Amir Saad wrote: I'll be starting some parallel programs in Open MPI and I would like to find a guide or any docs of Open MPI, any suggestions please? I couldn't find any docs on the website, how do I know about the APIs or the functions that I should use? Here are videos about OpenMPI/MPI -

Re: [MTT users] Can not find my testing results in OMPI MTT DB

2008-07-08 Thread Pavel Shamis (Pasha)
this issue ? Regards. Pasha Ethan Mallove wrote: On Wed, May/21/2008 09:53:11PM, Pavel Shamis (Pasha) wrote: Oops, in the "MTT server side problem" we discussed other issue. But anyway I did not see the problem on my server after the upgrade :) We took *some* steps to alle

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-03 Thread Pavel Shamis (Pasha)
with Intel and Pgi compilers: http://www.mellanox.com/products/ofed.php Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote: On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote: In trying to b

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-03 Thread Pavel Shamis (Pasha)
Is there a way to shut off early completion in 1.2.3? Sure, just add "--mca |pml_ob1_use_early_completion 0" to your command line.| || Or the the above a known issues and i should use 1.2.7-pre or grab a 1.3 snap shot? 1.2.6 should be ok. Regards, Pasha On Jul 2, 2008, at 10:42 AM, Pa

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-02 Thread Pavel Shamis (Pasha)
May be this FAQ will help : http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Brock Palen wrote: We have a code (arts) that locks up only when running on IB. Works fine on tcp and sm. When we ran it in a debugger. It locked up on a MPI_Comm_split() That as far

Re: [OMPI users] Fw: Re: Open MPI timeout problems.

2008-06-19 Thread Pavel Shamis (Pasha)
/19/08, Pavel Shamis (Pasha) /<pa...@dev.mellanox.co.il>/* wrote: From: Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il> Subject: Re: [OMPI users] Open MPI timeout problems. To: pj...@cornell.edu, "Open MPI Users" <us...@open-mpi.org> Date: Thur

Re: [OMPI users] Open MPI timeout problems.

2008-06-19 Thread Pavel Shamis (Pasha)
Usually the retry exceed point to some network issue on your cluster. I see from the logs that you still use MVAPI. If i remember correct, MVAPI include IBADM application that should be able to check and debug the network. BTW I recommend you to update your MVAPI driver to latest OpenFabric

Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Pavel Shamis (Pasha)
Scott Shaw wrote: Hi, I hope this is the right forum for my questions. I am running into a problem when scaling >512 cores on a infiniband cluster which has 14,336 cores. I am new to openmpi and trying to figure out the right -mca options to pass to avoid the "mca_oob_tcp_peer_complete_connect:

Re: [MTT users] Can not find my testing results in OMPI MTT DB

2008-05-21 Thread Pavel Shamis (Pasha)
Oops, in the "MTT server side problem" we discussed other issue. But anyway I did not see the problem on my server after the upgrade :) Pasha Pavel Shamis (Pasha) wrote: I had similar problem on my server. I upgraded the server to latest trunk and the problem disappear. (see "

Re: [MTT users] Can not find my testing results in OMPI MTT DB

2008-05-21 Thread Pavel Shamis (Pasha)
/or apache). On May 21, 2008, at 2:36 PM, Ethan Mallove wrote: On Wed, May/21/2008 06:46:06PM, Pavel Shamis (Pasha) wrote: I sent it directly to your email. Please check. Thanks, Pasha Got it. Thanks. It's a PHP memory overload issue. (Apparently I didn't look far back enough in the httpd error_

Re: [MTT users] Can not find my testing results in OMPI MTT DB

2008-05-21 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Are we running into http max memory problems or http max upload size problems again? I guess it is some server side issue, you need to check the /var/log/httpd/* log on the server. On May 21, 2008, at 5:28 AM, Pavel Shamis (Pasha) wrote: Hi, Here is test result from

Re: [MTT users] MTT server side problem

2008-05-20 Thread Pavel Shamis (Pasha)
, Pavel Shamis (Pasha) wrote: Hello, Did you have chance to review this patch ? Regards, Pasha Josh Hursey wrote: Sorry for the delay on this. I probably will not have a chance to look at it until later this week or early next. Thank you for the work on the patch. Cheers, Josh On May 12

Re: [MTT users] MTT server side problem

2008-05-19 Thread Pavel Shamis (Pasha)
that we should unify the functionality I cannot recommend this patch since it will result in losing useful error handling functionality. Maybe there is another way to clean this up to preserve the error reporting. -- Josh On May 7, 2008, at 11:56 AM, Pavel Shamis (Pasha) wrote: Hi Josh, I had

Re: [MTT users] MTT server side problem

2008-05-12 Thread Pavel Shamis (Pasha)
the functionality I cannot recommend this patch since it will result in losing useful error handling functionality. Maybe there is another way to clean this up to preserve the error reporting. -- Josh On May 7, 2008, at 11:56 AM, Pavel Shamis (Pasha) wrote: Hi Josh, I had the original

Re: [MTT users] MTT server side problem

2008-05-07 Thread Pavel Shamis (Pasha)
: On Tue, May/06/2008 06:29:33PM, Pavel Shamis (Pasha) wrote: I'm not sure which cron jobs you're referring to. Do you mean these? https://svn.open-mpi.org/trac/mtt/browser/trunk/server/php/cron I talked about this one: https://svn.open-mpi.org/trac/mtt/wiki/ServerMaintenance

Re: [MTT users] MTT server side problem

2008-05-06 Thread Pavel Shamis (Pasha)
have the latest mtt/server scripts? https://svn.open-mpi.org/trac/mtt/changeset/1119/trunk/server/php/submit -Ethan On Tue, May/06/2008 03:26:43PM, Pavel Shamis (Pasha) wrote: About the issue: 1. On client side I see ""*** WARNING: MTTDatabase client did not get a serial&

Re: [MTT users] MTT server side problem

2008-05-06 Thread Pavel Shamis (Pasha)
navailable My memory limit in php.ini file was set on 256MB ! Any ideas ? Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

[MTT users] MTT server side problem

2008-05-06 Thread Pavel Shamis (Pasha)
My memory limit in php.ini file was set on 256MB ! Any ideas ? Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] infiniband

2008-05-01 Thread Pavel Shamis (Pasha)
1 5 123391 Pavel Shamis (Pasha) wrote: SLIM H.A. wrote: Is it possible to get information about the usage of hca ports similar to the result of the mx_endpoint_info command for Myrinet boards? The ibstat command gives information like this: Port 1: State: Active Physical state: LinkUp

Re: [OMPI users] infiniband

2008-04-29 Thread Pavel Shamis (Pasha)
using an infiniband port or comunicates through plain ethernet. I would be grateful for any advice You have access to some counters in /sys/class/infiniband/mlx4_0/ports/1/counters/ (counters for hca - mlx4_0 , port 1) -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] multi-rail failover with IB

2008-04-03 Thread Pavel Shamis (Pasha)
second one will be reserver for back-up. On network failure on the first port all connections will migrate to second port. The APM works only on the HCA level - I mean that you can not migrate between different HCAs, you can migrate only between 2 ports of the same HCA. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] MPI-2 Supported on Open MPI 1.2.5?

2008-03-12 Thread Pavel Shamis (Pasha)
mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] Set GID

2008-03-12 Thread Pavel Shamis (Pasha)
Ok, I will do. Jeff Squyres wrote: Sure, that would be fine. Can you write it up in a little more FAQ-ish style? I can add it to the web page. See this wiki item: https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries On Mar 12, 2008, at 5:33 AM, Pavel Shamis (Pasha) wrote: Run

Re: [OMPI users] Set GID

2008-03-12 Thread Pavel Shamis (Pasha)
t I cannot seem to find anywhere that will tell me how to change the GID to something else. Thanks, Jon ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [MTT users] mtt reports arrive without subject.

2008-01-14 Thread Pavel Shamis (Pasha)
I found the problem it was a typo in name of variable. I had something like : email_subject: MPI regression $broken_name After fixing the name I started to get reports ! Thanks. Pasha Pavel Shamis (Pasha) wrote: I might've misread your last email. Did the new email_subject INI

Re: [MTT users] hostlist enhancement

2008-01-10 Thread Pavel Shamis (Pasha)
rg/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies

[MTT users] mtt reports arrive without subject.

2008-01-10 Thread Pavel Shamis (Pasha)
2008 Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] job running question

2006-04-10 Thread Pavel Shamis (Pasha)
:) Regards, Pavel Shamis (Pasha) Adams Samuel D Contr AFRL/HEDR wrote: I set bash to have unlimited size core files like this: $ ulimit -c unlimited But, it was not dropping core files for some reason when I was running with mpirun. Just to make sure it would do what I expected, I wrot