If you use the rsh tree spawn mechanism, then yes, any node must be able
to SSH passwordless to any node.
This is only used to spawn one orted per node.
when the number of nodes is important, a tree spawn is faster and avoids
having all the SSH connections issued and maintained from the
node ru
Thanks for clarifying that Gilles.
Now I have seen that omitting "-mca plm_rsh_no_tree_spawn 1" requires
establishing passwordless SSH among the machines but this is not required
for setting "--mca coll_tuned_bcast_algo". Is this correct or am I missing
something?
Also, among all possible broadca
Konstantinos,
I am afraid there is some confusion here.
the plm_rsh_no_tree_spawn is only used at startup time (e.g. when remote
launching one orted daemon per node but the one running mpirun).
there is zero impact on the performances of MPI communications such as
MPI_Bcast()
the coll/t
I have implemented some algorithms in C++ which are greatly affected by
shuffling time among nodes which is done by some broadcast calls. Up to
now, I have been testing them by running something like
mpirun -mca btl ^openib -mca plm_rsh_no_tree_spawn 1 ./my_test
which I think make MPI_Bcast to wo
> On Apr 25, 2016, at 11:33 , Dave Love wrote:
>
> George Bosilca writes:
>
>> Dave,
>>
>> You are absolutely right, the parameters are now 6-7 years old,
>> gathered on interconnects long gone. Moreover, several discussions in
>> this mailing list indicated that they do not match current net
George Bosilca writes:
> Dave,
>
> You are absolutely right, the parameters are now 6-7 years old,
> gathered on interconnects long gone. Moreover, several discussions in
> this mailing list indicated that they do not match current network
> capabilities.
>
> I have recently reshuffled the tuned
Dave,
You are absolutely right, the parameters are now 6-7 years old, gathered on
interconnects long gone. Moreover, several discussions in this mailing list
indicated that they do not match current network capabilities.
I have recently reshuffled the tuned module to move all the algorithms in
George Bosilca writes:
> Matthieu,
>
> If you are talking about how Open MPI selects between different broadcast
> algorithms you might want to read [1]. We have implemented a dozen
> different broadcast algorithms and have run a set of tests to measure their
> performance.
I'd been meaning to
-
> *From:* users [users-boun...@open-mpi.org] on behalf of George Bosilca [
> bosi...@icl.utk.edu]
> *Sent:* Tuesday, April 19, 2016 2:03 PM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] MPI_Bcast implementations in OpenMPI
>
> Matthieu,
>
> If you are talkin
Users
Subject: Re: [OMPI users] MPI_Bcast implementations in OpenMPI
Matthieu,
If you are talking about how Open MPI selects between different broadcast
algorithms you might want to read [1]. We have implemented a dozen different
broadcast algorithms and have run a set of tests to measure their
Matthieu,
If you are talking about how Open MPI selects between different broadcast
algorithms you might want to read [1]. We have implemented a dozen
different broadcast algorithms and have run a set of tests to measure their
performance. We then used a quad tree clasiffication algorithm to minim
On Apr 15, 2016, at 9:18 AM, Dorier, Matthieu wrote:
>
> I'd like to know how OpenMPI implements MPI_Bcast. And if different
> implementations are provided, how one is selected.
This is a fairly complicated topic. This old paper is the foundation for how
Open MPI works (it's a bit different t
Hi,
I'd like to know how OpenMPI implements MPI_Bcast. And if different
implementations are provided, how one is selected.
Thanks,
Matthieu Dorier
Hello there,
In my fortran code, I used mpi_bcast to broadcast an array Q(21, 51,
14) (the size for it is 150,000,000) from the root to all the nodes. I
found when I used this bcast subroutine, it code will be very slow and
sometimes it hangs there. Once I commented this array, this code speed
1.4.3 is fairly ancient.
Can you upgrade to 1.6.5?
On Jul 26, 2013, at 3:15 AM, Dusan Zoric wrote:
>
> I am running application that performs some transformations of large matrices
> on 7-node cluster. Nodes are connected via QDR 40 Gbit Infiniband. Open MPI
> 1.4.3 is installed on the syste
I am running application that performs some transformations of large
matrices on 7-node cluster. Nodes are connected via QDR 40 Gbit Infiniband.
Open MPI 1.4.3 is installed on the system.
Given matrix transformation requires large data exchange between nodes in
such a way that at each algorithm st
A few points to add to this discussion...
1. In the new (proposed) MPI-3 Fortran bindings (i.e., the "use mpi_f08"
module), array subsections will be handled properly by MPI. However, we'll
have to wait for the Fortran compilers to support F08 features first (i.e.,
both the MPI Forum and the F
Thanks all for your converging point of view about my problem.
Portability is also an important point for this code so there is only one
solution: using user defined data type.
In my mind, this was more for C or C++ code without the fortran subarray
behavior but I was in error.
The problem is
When it comes to intrinsic Fortran-90 functions, or to libraries provided by
the compiler vendor
[e.g. MKL in the case of Intel], I do agree that they *should* be able to parse
the array-section
notation and use the correct memory layout.
However, for libraries that are not part of Fortran-90, s
Actually, sub array passing is part of the F90 standard (at least
according to every document I can find), and not an Intel extension. So
if it doesn't work you should complain to the compiler company. One of
the reasons for using it is that the compiler should be optimized for
whatever method
Hi Patrick
>From my mere MPI and Fortran-90 user point of view,
I think that the solution offered by the MPI standard [at least up to MPI-2]
to address the problem of non-contiguous memory layouts is to use MPI
user-defined types,
as I pointed out in my previous email.
I like this solution becaus
Thanks all for your anwers. yes, I understand well that it is a non contiguous
memory access problem as the MPI_BCAST should wait for a pointer on a valid
memory zone. But I'm surprised that with the MPI module usage Fortran does not
hide this discontinuity in a contiguous temporary copy of the
The interface to MPI_Bcast does not specify a assumed-shape-array dummy
first argument. Consequently, as David points out, the compiler makes a
contiguous temporary copy of the array section to pass to the routine. If
using ifort, try the "-check arg_temp_created" compiler option to verify
creation
What FORTRAN compiler are you using? This should not really be an issue
with the MPI implementation, but with the FORTRAN. This is legitimate
usage in FORTRAN 90 and the compiler should deal with it. I do similar
things using ifort and it creates temporary arrays when necessary and it
all works
Hi Patrick
I think tab(i,:) is not contiguous in memory, but has a stride of nbcpus.
Since the MPI type you are passing is just the barebones MPI_INTEGER,
MPI_BCAST expects the four integers to be contiguous in memory, I guess.
The MPI calls don't have any idea of the Fortran90 memory layout,
and
I've got a strange problem with Fortran90 and MPI_BCAST call on a large
application. I've isolated the problem in this short program samples.
With fortran we can use subarrays in functions calls. Example, with passing a
subarray to the "change" procedure:
MODULE mymod
IMPLICIT NONE
CONTAINS
S
David Mathog wrote:
For the receive I do not see how to use a collective. Each worker sends
back a data structure, and the structures are of of varying size. This
is almost always the case in Bioinformatics, where what is usually
coming back from each worker is a count M of the number of signi
So the 2/2 consensus is to use the collective. That is straightforward
for the send part of this, since all workers are sent the same data.
For the receive I do not see how to use a collective. Each worker sends
back a data structure, and the structures are of of varying size. This
is almost al
Unless your cluster has some weird connection topology and you're trying to
take advantage of that, collective is the best bet.
On Mon, Dec 13, 2010 at 4:26 PM, Eugene Loh wrote:
> David Mathog wrote:
>
> Is there a rule of thumb for when it is best to contact N workers with
>> MPI_Bcast vs. wh
David Mathog wrote:
Is there a rule of thumb for when it is best to contact N workers with
MPI_Bcast vs. when it is best to use a loop which cycles N times and
moves the same information with MPI_Send to one worker at a time?
The rule of thumb is to use a collective whenever you can. The
ra
Is there a rule of thumb for when it is best to contact N workers with
MPI_Bcast vs. when it is best to use a loop which cycles N times and
moves the same information with MPI_Send to one worker at a time?
For that matter, other than the coding semantics, is there any real
difference between the t
Mpi send and recv are blocking, while you can exit bcast even if other
processes haven't receive the bcast yet. A general rule of thumb is
mpi calls are optimized and almost always perform better than if you
were to manage the communication youself.
On 9/1/10, ananda.mu...@wipro.com wrote:
> Hi
Hi
If I replace MPI_Bcast() with a paired MPI_Send() and MPI_Recv() calls,
what kind of impact does it have on the performance of the program? Are
there any benchmarks of MPI_Bcast() vs paired MPI_Send() and
MPI_Recv()??
Thanks
Ananda
Please do not print this email unless it is absolutely
f Squyres wrote:
From: Jeff Squyres
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Friday, 13 August, 2010, 3:03 AM
Dick / all --
I just had a phone call with Ralph Castain who has had some additional off-list
mails with Randolph. Apparently, none of us u
- MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
> users-boun...@open-mpi.org wrote on 08/11/2010 08:59:16 PM:
>
> > [image removed]
> >
n...@open-mpi.org wrote on 08/11/2010 08:59:16 PM:
> [image removed]
>
> Re: [OMPI users] MPI_Bcast issue
>
> Randolph Pullen
>
> to:
>
> Open MPI Users
>
> 08/11/2010 09:01 PM
>
> Sent by:
>
> users-boun...@open-mpi.org
>
> Please respon
Interesting point.
--- On Thu, 12/8/10, Ashley Pittman wrote:
From: Ashley Pittman
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Thursday, 12 August, 2010, 12:22 AM
On 11 Aug 2010, at 05:10, Randolph Pullen wrote:
> Sure, but broadcasts are faster -
question is why.
--- On Wed, 11/8/10, Richard Treumann wrote:
From: Richard Treumann
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Wednesday, 11 August, 2010, 11:34 PM
Randolf
I am confused about using multiple,
concurrent mpirun operations. If there are
On 11 Aug 2010, at 05:10, Randolph Pullen wrote:
> Sure, but broadcasts are faster - less reliable apparently, but much faster
> for large clusters.
Going off-topic here but I think it's worth saying:
If you have a dataset that requires collective communication then use the
function call that
On Aug 11, 2010, at 12:10 AM, Randolph Pullen wrote:
> Sure, but broadcasts are faster - less reliable apparently, but much faster
> for large clusters.
Just to be totally clear: MPI_BCAST is defined to be "reliable", in the sense
that it will complete or invoke an error (vs. unreliable data
On Aug 11, 2010, at 9:54 AM, Jeff Squyres wrote:
> (I'll say that OMPI's ALLGATHER algorithm is probably not well optimized for
> massive data transfers like you describe)
Wrong wrong wrong -- I should have checked the code before sending. I made the
incorrect assumption that OMPI still only h
On Aug 10, 2010, at 10:09 PM, Randolph Pullen wrote:
> Jeff thanks for the clarification,
> What I am trying to do is run N concurrent copies of a 1 to N data movement
> program to affect an N to N solution. The actual mechanism I am using is to
> spawn N copies of mpirun from PVM across the cl
Randolf
I am confused about using multiple, concurrent mpirun operations. If
there are M uses of mpirun and each starts N tasks (carried out under pvm
or any other way) I would expect you to have M completely independent MPI
jobs with N tasks (processes) each. You could have some root in eac
wrote:
From: Terry Frankcombe
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Wednesday, 11 August, 2010, 1:57 PM
On Tue, 2010-08-10 at 19:09 -0700, Randolph Pullen wrote:
> Jeff thanks for the clarification,
> What I am trying to do is run N concurre
On Tue, 2010-08-10 at 19:09 -0700, Randolph Pullen wrote:
> Jeff thanks for the clarification,
> What I am trying to do is run N concurrent copies of a 1 to N data
> movement program to affect an N to N solution.
I'm no MPI guru, nor do I completely understand what you are doing, but
isn't this an
rom: Jeff Squyres
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Wednesday, 11 August, 2010, 6:24 AM
+1 on Eugene's comment that I don't fully understand what you are trying to do.
Can you send a short example code?
Some random points:
- Edgar alre
rote:
> The install was completly vanilla - no extras a plain .configure command line
> (on FC10 x8x_64 linux)
>
> Are you saying that all broadcast calls are actually implemented as serial
> point to point calls?
>
>
> --- On Tue, 10/8/10, Ralph Castain wrote:
>
st is implemented with multicast calls but does it use any
actual broadcast calls at all?
I
know I'm scraping the edges here looking for something but I just cant get my
head around why it should fail where it has.
--- On Mon, 9/8/10, Ralph Castain wrote:
From: Ralph Castain
Su
presume that bcast is implemented with multicast calls but does it use any
> actual broadcast calls at all?
> I know I'm scraping the edges here looking for something but I just cant get
> my head around why it should fail where it has.
>
> --- On Mon, 9/8/10, Ralph Castain
he odds of resolution.
From:
Randolph Pullen
To:
us...@open-mpi.org
Date:
08/07/2010 01:23 AM
Subject:
[OMPI users]
7846 Fax (845) 433-8363
From:
Randolph Pullen
To:
us...@open-mpi.org
Date:
08/07/2010 01:23 AM
Subject:
[OMPI users] MPI_Bcast issue
Sent by:
users-boun...@open-mpi.org
I seem to be having a problem with MPI_Bcast.
My massive I/O intensive data movement program must broadcast from n to n
n
llen
To:
us...@open-mpi.org
Date:
08/07/2010 01:23 AM
Subject:
[OMPI users] MPI_Bcast issue
Sent by:
users-boun...@open-mpi.org
I seem to be having a problem with MPI_Bcast.
My massive I/O intensive data movement program must broadcast from n to n
nodes. My problem starts because I requi
o or more copies are run at [exactly] the
> same time.
>
> Has anyone else seen similar behavior in concurrently running
> programs that perform lots of broadcasts perhaps?
>
> Randolph
>
>
> --- On Sun, 8/8/10, David Zhang wrote:
>
> From: David Zhang Subject: R
why it should fail where it has.
--- On Mon, 9/8/10, Ralph Castain wrote:
From: Ralph Castain
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
Received: Monday, 9 August, 2010, 1:32 PM
Hi Randolph
Unless your code is doing a connect/accept between the copies, there
ram waits on broadcast reception forever when two or
> more copies are run at [exactly] the same time.
>
> Has anyone else seen similar behavior in concurrently running programs that
> perform lots of broadcasts perhaps?
>
> Randolph
>
>
> --- On Sun, 8/8/10, David
copies are run at [exactly] the same time.
Has anyone else seen similar behavior in concurrently running programs that
perform lots of broadcasts perhaps?
Randolph
--- On Sun, 8/8/10, David Zhang wrote:
From: David Zhang
Subject: Re: [OMPI users] MPI_Bcast issue
To: "Open MPI Users"
In particular, intercommunicators
On 8/7/10, Aurélien Bouteiller wrote:
> You should consider reading about communicators in MPI.
>
> Aurelien
> --
> Aurelien Bouteiller, Ph.D.
> Innovative Computing Laboratory, The University of Tennessee.
>
> Envoyé de mon iPad
>
> Le Aug 7, 2010 à 1:05, Randol
You should consider reading about communicators in MPI.
Aurelien
--
Aurelien Bouteiller, Ph.D.
Innovative Computing Laboratory, The University of Tennessee.
Envoyé de mon iPad
Le Aug 7, 2010 à 1:05, Randolph Pullen a écrit :
> I seem to be having a problem with MPI_Bcast.
> My massive I/O int
I seem to be having a problem with MPI_Bcast.
My massive I/O intensive data movement program must broadcast from n to n
nodes. My problem starts because I require 2 processes per node, a sender and a
receiver and I have implemented these using MPI processes rather than tackle
the complexities of
I have just created a small cluster consisting of three nodes:
bellhuey AMD 64 with 4 cores
wolf1 AMD 64 with 2 cores
wolf2 AMD 64 with 2 cores
The host file is:
bellhuey slots=4
wolf1 slots=2
wolf2 slots=2
bellhuey is the master and wolf1 and wolf2 share the /usr and /home file
, April 24, 2009 2:16:22 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
Right. So, baseline performance seems reasonable, but there is an odd
spike that seems difficult to explain. This is annoying, but again:
how important is it to resolve that mystery? You can spend a few days
trying to
Second cluster is almost the same features with the previous one
From: Eugene Loh
To: Open MPI Users
Sent: Friday, April 24, 2009 1:26:14 AM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
So, the remaining mystery is the 6x or so spike at 128 Mbyte. Dunno.
How important is it to resolve that mystery?
, centOS 4.6,
Second cluster: 2.8 GHz Intel Xeon, 3GBmemory, Fedora core5
Openmpi1.3 is used in both cluster.
From: Eugene Loh
To: Open MPI Users
Sent: Friday, April 24, 2009 1:26:14 AM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
Okay. So, going back to
ble) but 131072 KB.
It means around 128 MB.
From: Jeff Squyres
To: Open MPI Users
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way*
too long. Even 2.5 sec at 2MB seems too long
Sorry, I had a mistake in calculation.
Not 131072 (double) but 131072 KB.
It means around 128 MB.
From: Jeff Squyres
To: Open MPI Users
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
Very strange; 6 seconds for a
To: Open MPI Users
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI
Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way* too
long. Even 2.5 sec at 2MB seems too long -- what is your network speed? I'm
not entirely sure what you me
Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way*
too long. Even 2.5 sec at 2MB seems too long -- what is your network
speed? I'm not entirely sure what you mean by "4 link" on your graph.
Without more information, I would first check your hardware setup to
see if the
Hi,
One more question:
I have executed the MPI_Bcast() in 64 processes in 16 nodes Ethernet multiple
links cluster.
The result is shown in the file attached on this E-mail.
What is going on at 131072 double message size?
I have executed it many times but the result is still the same.
THANK YOU!
Sorry I should have given the version number. I'm running
openmpi-1.2.4 on Fedora Core 6
Dave
Adrian Knoth wrote:
On Thu, Jul 31, 2008 at 03:26:09PM +0100, David Robson wrote:
It also works if I disable the private interface. Otherwise there
are no network problems. I can ping any host
On Thu, Jul 31, 2008 at 03:26:09PM +0100, David Robson wrote:
> It also works if I disable the private interface. Otherwise there
> are no network problems. I can ping any host from any other.
> openmpi programs without MPI_BCast work OK.
Weird.
> Has any seen anything like this, or have any i
Dear OpenMPI users
I have a problem with openmpi codes hanging in MPI_BCast ...
All our nodes are connected to one LAN. However, half of them
also have an interface to a second private LAN. If the first
openMPI process of a job starts on one of the dual-homed nodes, and
a second process fr
Dear Jeff
I want to send an integer vector of size 4000.It is a
very confusing problem.
--- Jeff Squyres wrote:
> If you're seeing the same error from 2 entirely
> different MPI
> implementations, it is possible that it is an error
> in your code.
>
> Ensure that all processes are calling MP
If you're seeing the same error from 2 entirely different MPI
implementations, it is possible that it is an error in your code.
Ensure that all processes are calling MPI_Bcast with the same
arguments (e.g., count, datatype, root, etc.), even on that 4000th
iteration.
How big are the block
Dear Friends,
I am writing a matrix multiplication program with MPI. MPI_Bcast does
not broadcast to all processes, in last iteration for block size
greater than a specific size. I test it with both MPICH and OPENMPI.I
have 12 processes which 7 of them are reached to MPI_Bcast but when
master (ran
On Jun 29, 2006, at 11:16 PM, Graham E Fagg wrote:
On Thu, 29 Jun 2006, Doug Gregor wrote:
When I use algorithm 6, I get:
[odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast
[odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast
Broadcasting integers from root 0...[od
On Thu, 29 Jun 2006, Doug Gregor wrote:
When I use algorithm 6, I get:
[odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast
[odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast
Broadcasting integers from root 0...[odin004.cs.indiana.edu:11752]
*** An error occurred in
On Thu, 29 Jun 2006, Doug Gregor wrote:
Are there other settings I can tweak to try to find the algorithm
that it's deciding to use at run-time?
Yes just: -mca coll_base_verbose 1
will show whats being decided at run time. i.e.
[reliant:25351] ompi_coll_tuned_bcast_intra_dec_fixed
[reliant:25
On Jun 29, 2006, at 5:23 PM, Graham E Fagg wrote:
Hi Doug
wow, looks like some messages are getting lost (or even delivered
to the wrong peer on the same node.. ) Could you also try with:
-mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca
coll_tuned_bcast_algorithm <1,2,3,
Hi Doug
wow, looks like some messages are getting lost (or even delivered to the
wrong peer on the same node.. ) Could you also try with:
-mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca
coll_tuned_bcast_algorithm <1,2,3,4,5,6>
The values 1-6 control which topology/aglorith
I am running into a problem with a simple program (which performs
several MPI_Bcast operations) hanging. Most processes hang in
MPI_Finalize, the others hang in MPI_Bcast. Interestingly enough,
this only happens when I oversubscribe the nodes. For instance, using
IU's Odin cluster, I take 4
80 matches
Mail list logo