Re: [OMPI users] Broadcast problem

2013-04-30 Thread Randolph Pullen
Oops,I think I meant gather not scatter...

[OMPI users] Broadcast problem

2013-04-30 Thread Randolph Pullen
I have a number of processes split into sender and receivers. Senders read large quantities of randomly organised data into buffers for transmission to receivers. When a buffer is full it needs to be transmitted to all receivers this repeats until all the data is transmitted. Problem is that

[OMPI users] Re...

2013-02-02 Thread Randolph Pullen
http://www.compu-gen.com/components/com_content/yaid3522.php Randolph Pullen 2/3/2013 1:41:11 AM .

[OMPI users] Hi!!!

2013-02-02 Thread Randolph Pullen
http://www.corcoranharnist.com/components/com_content/yaid3521.php _ Randolph Pullen

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-11 Thread Randolph Pullen
lit...@dev.mellanox.co.il> To: Randolph Pullen <randolph_pul...@yahoo.com.au> Cc: OpenMPI Users <us...@open-mpi.org> Sent: Monday, 10 September 2012 9:11 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling Randolph, So what you saying in short, leaving all the numbers aside

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen
One system is actually an i5-2400 - maybe its throttling back on 2 cores to save power? The other(I7) shows consistent CPU MHz on all cores From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il> To: Randolph Pullen <randolph_pul...@yahoo.com.au>; O

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen
   : 1 cpu MHz : 3301.000 processor   : 2 cpu MHz : 1600.000 processor   : 3 cpu MHz : 1600.000 Which seems oddly wierd to me... From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il> To: Randolph Pullen <randolph_pul...@yah

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-02 Thread Randolph Pullen
No RoCE, Just native IB with TCP over the top. No I haven't used 1.6 I was trying to stick with the standards on the mellanox disk. Is there a known problem with 1.4.3 ? From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il> To: Randolph Pullen <ran

[OMPI users] Infiniband performance Problem and stalling

2012-08-31 Thread Randolph Pullen
(reposted with consolidated information) I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G cards running Centos 5.7 Kernel 2.6.18-274 Open MPI 1.4.3 MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2): On a Cisco 24 pt switch   Normal performance is: $ mpirun --mca btl

Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-30 Thread Randolph Pullen
. - On occasions it seems to stall indefinately, waiting on a single receive.  Any ideas appreciated. Thanks in advance, Randolph From: Randolph Pullen <randolph_pul...@yahoo.com.au> To: Paul Kapinos <kapi...@rz.rwth-aachen.de>; Open MPI Users <us...@open

Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-29 Thread Randolph Pullen
to 64K and force short messages.  Then the openib times are the same as TCP and no faster. I'ms till at a loss as to why... From: Paul Kapinos <kapi...@rz.rwth-aachen.de> To: Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users <us...@open

[OMPI users] Infiniband performance Problem and stalling

2012-08-27 Thread Randolph Pullen
I have a test rig comprising 2 i7 systems with Melanox III HCA 10G cards running Centos 5.7 Kernel 2.6.18-274 Open MPI 1.4.3 MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2): On a Cisco 24 pt switch Normal performance is: $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts  PingPong

Re: [OMPI users] Fw: system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
started manually ? Doh ! From: Randolph Pullen <randolph_pul...@yahoo.com.au> To: Ralph Castain <r...@open-mpi.org>; Open MPI Users <us...@open-mpi.org> Sent: Friday, 20 January 2012 2:17 PM Subject: Re: [OMPI users] Fw: system() call corr

Re: [OMPI users] Fw: system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
the perl both methods are identical. BTW - the perl is the server the openMPI is the client. From: Ralph Castain <r...@open-mpi.org> To: Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users <us...@open-mpi.org> Sent: Friday, 20 Ja

[OMPI users] Fw: system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
FYI - Forwarded Message - From: Randolph Pullen <randolph_pul...@yahoo.com.au> To: Jeff Squyres <jsquy...@cisco.com> Sent: Friday, 20 January 2012 12:45 PM Subject: Re: [OMPI users] system() call corrupts MPI processes I'm using TCP on 1.4.1 (its actually IPoIB) OpenIB

Re: [OMPI users] system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
can't think of a reason >> immediately as to why this wouldn't work, but there may be a subtle >> interaction in there somewhere that causes badness (e.g., memory corruption). >> >> >> On Jan 19, 2012, at 1:57 AM, Randolph Pullen wrote: >> >>> >>&g

Re: [OMPI users] system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
le > interaction in there somewhere that causes badness (e.g., memory corruption). > > > On Jan 19, 2012, at 1:57 AM, Randolph Pullen wrote: > >> >> I have a section in my code running in rank 0 that must start a perl program >> that it then connects to via a tcp socke

[OMPI users] system() call corrupts MPI processes

2012-01-19 Thread Randolph Pullen
I have a section in my code running in rank 0 that must start a perl program that it then connects to via a tcp socket. The initialisation section is shown here:     sprintf(buf, "%s/session_server.pl -p %d &", PATH,port);     int i = system(buf);     printf("system returned %d\n", i); Some

Re: [OMPI users] High CPU usage with yield_when_idle =1 on CFS

2011-09-03 Thread Randolph Pullen
ux kernels (there was big debate about this at kernel.org).  IIRC, there's some kernel parameter that you can tweak to make it behave better, but I'm afraid I don't remember what it is.  Some googling might find it...? On Sep 1, 2011, at 10:06 PM, Eugene Loh wrote: > On 8/31/2011 11:48 PM,

[OMPI users] High CPU usage with yield_when_idle =1 on CFS

2011-09-01 Thread Randolph Pullen
I recall a discussion some time ago about yield, the Completely F%’d Scheduler (CFS) and OpenMPI. My system is currently suffering from massive CPU use while busy waiting.  This gets worse as I try to bump up user concurrency. I am running with yield_when_idle but its not enough. Is there

Re: [OMPI users] Mpirun only works when n< 3

2011-07-13 Thread Randolph Pullen
/11, Jeff Squyres <jsquy...@cisco.com> wrote: From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] Mpirun only works when n< 3 To: randolph_pul...@yahoo.com.au Cc: "Open MPI Users" <us...@open-mpi.org> Received: Tuesday, 12 July, 2011, 10:29 PM On Jul

Re: [OMPI users] Mpirun only works when n< 3

2011-07-11 Thread Randolph Pullen
--Running this on either node A or C produces the same resultNode C runs openMPI 1.4.1 and is an ordinary dual core on FC10 , not an i5 2400 like the others.all the binaries are compiled on FC10 with gcc 4.3.2 --- On Tue, 12/7/11, Randolph Pullen <randolph_pul...@yahoo.com.au> wrote:

Re: [OMPI users] Mpirun only works when n< 3

2011-07-11 Thread Randolph Pullen
> Subject: Re: [OMPI users] Mpirun only works when n< 3 To: randolph_pul...@yahoo.com.au, "Open MPI Users" <us...@open-mpi.org> Received: Tuesday, 12 July, 2011, 12:21 AM Have you disabled firewalls between your compute nodes? On Jul 11, 2011, at 9:34 AM, Randolph Pullen wrote

[OMPI users] Mpirun only works when n< 3

2011-07-11 Thread Randolph Pullen
This appears to be similar to the problem described in: https://svn.open-mpi.org/trac/ompi/ticket/2043 However, those fixes do not work for me. I am running on an - i5 sandy bridge under Ubuntu 10.10  8 G RAM - Kernel 2.6.32.14 with OpenVZ tweaks - OpenMPI V 1.4.1 I am

Re: [OMPI users] is there an equiv of iprove for bcast?

2011-05-10 Thread Randolph Pullen
From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] is there an equiv of iprove for bcast? To: randolph_pul...@yahoo.com.au Cc: "Open MPI Users" <us...@open-mpi.org> Received: Monday, 9 May, 2011, 11:27 PM On May 3, 2011, at 8:20 PM, Randolph Pullen wrote: &

Re: [OMPI users] is there an equiv of iprove for bcast?

2011-05-04 Thread Randolph Pullen
the receivers ahead of time.  If the receivers don't know who the root will be beforehand, that's unfortunately not a good match for the MPI_Bcast operation. On May 3, 2011, at 4:07 AM, Randolph Pullen wrote: > > From: Randolph Pullen <randolph_pul...@yahoo.com.au> > Subject: Re: R

Re: [OMPI users] is there an equiv of iprove for bcast?

2011-05-03 Thread Randolph Pullen
From: Randolph Pullen <randolph_pul...@yahoo.com.au> Subject: Re: Re: [OMPI users] is there an equiv of iprove for bcast? To: us...@open-mpi.or Received: Monday, 2 May, 2011, 12:53 PM Non blocking Bcasts or tests would do it.I currently have the clearing-house solution w

[OMPI users] is there an equiv of iprove for bcast?

2011-04-29 Thread Randolph Pullen
I am having a design issue:My server application has 2 processes per node, 1 listener and 1 worker. Each listener monitors a specified port for incoming TCP connections with the goal that on receipt of a request it will distribute it over the workers in a SIMD fashion. My

[OMPI users] MPI_Comm_create prevents external socket connections

2011-04-28 Thread Randolph Pullen
I have a problem with MPI_Comm_create, My server application has 2 processes per node, 1 listener and 1 worker. Each listener monitors a specified port for incoming TCP connections with the goal that on receipt of a request it will distribute it over the workers in a SIMD

Re: [OMPI users] Running on crashing nodes

2010-09-27 Thread Randolph Pullen
I have have successfully used a perl program to start mpirun and record its PIDThe monitor can then watch the output from MPI and terminate the mpirun command with a series of kills or something if it is having trouble. One method of doing this is to prefix all legal output from your MPI

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Randolph Pullen
, Rahul Nabar <rpna...@gmail.com> wrote: From: Rahul Nabar <rpna...@gmail.com> Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas? To: "Open MPI Users" <us...@open-mpi.org> Received: Wednesday, 25 August, 2010, 3:38 AM On Mon, Aug 2

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Randolph Pullen
; > Open MPI Users > > 08/23/2010 05:11 PM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > > On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen <randolph_pul...@yahoo.com.au > > wrote: > >

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-22 Thread Randolph Pullen
Its a long shot but could it be related to the total data volume ? ie  524288 * 80 = 41943040 bytes active in the cluster Can you exceed this 41943040 data volume with a smaller message repeated more often or a larger one less often? --- On Fri, 20/8/10, Rahul Nabar wrote:

Re: [OMPI users] MPI_Bcast issue

2010-08-12 Thread Randolph Pullen
, NY 12601 > Tele (845) 433-7846         Fax (845) 433-8363 > > > users-boun...@open-mpi.org wrote on 08/11/2010 08:59:16 PM: > > > [image removed] > > > > Re: [OMPI users] MPI_Bcast issue > > > > Randolph Pullen > > > > to: > >

Re: [OMPI users] MPI_Bcast issue

2010-08-11 Thread Randolph Pullen
Interesting point. --- On Thu, 12/8/10, Ashley Pittman <ash...@pittman.co.uk> wrote: From: Ashley Pittman <ash...@pittman.co.uk> Subject: Re: [OMPI users] MPI_Bcast issue To: "Open MPI Users" <us...@open-mpi.org> Received: Thursday, 12 August, 2010, 12:22 AM On 11

Re: [OMPI users] MPI_Bcast issue

2010-08-11 Thread Randolph Pullen
I (a single user) am running N separate MPI  applications doing 1 to N broadcasts over PVM, each MPI application is started on each machine simultaneously by PVM - the reasons are back in the post history. The problem is that they somehow collide - yes I know this should not happen, the

Re: [OMPI users] MPI_Bcast issue

2010-08-11 Thread Randolph Pullen
<te...@chem.gu.se> wrote: From: Terry Frankcombe <te...@chem.gu.se> Subject: Re: [OMPI users] MPI_Bcast issue To: "Open MPI Users" <us...@open-mpi.org> Received: Wednesday, 11 August, 2010, 1:57 PM On Tue, 2010-08-10 at 19:09 -0700, Randolph Pullen wrote: > Je

Re: [OMPI users] MPI_Bcast issue

2010-08-10 Thread Randolph Pullen
riority because not enough people have asked for it. I hope that helps... On Aug 9, 2010, at 10:43 PM, Randolph Pullen wrote: > The install was completly vanilla - no extras a plain .configure command line > (on FC10 x8x_64 linux) > > Are you saying that all broadcast calls are actua

Re: [OMPI users] MPI_Bcast issue

2010-08-09 Thread Randolph Pullen
rious algorithms. Best guess I can offer is that there is a race condition in your program that you are tripping when other procs that share the node change the timing. How did you configure OMPI when you built it? On Aug 8, 2010, at 11:02 PM, Randolph Pullen wrote: The only MPI calls I am usin

Re: [OMPI users] MPI_Bcast issue

2010-08-09 Thread Randolph Pullen
te with an MPI_Bcast between processes started by another mpirun. On Aug 8, 2010, at 7:13 PM, Randolph Pullen wrote: Thanks,  although “An intercommunicator cannot be used for collective communication.” i.e ,  bcast calls., I can see how the MPI_Group_xx calls can be used to produce a useful

Re: [OMPI users] MPI_Bcast issue

2010-08-08 Thread Randolph Pullen
t; > Aurelien > -- > Aurelien Bouteiller, Ph.D. > Innovative Computing Laboratory, The University of Tennessee. > > Envoyé de mon iPad > > Le Aug 7, 2010 à 1:05, Randolph Pullen <randolph_pul...@yahoo.com.au> a > écrit : > >> I seem to be having a problem with

[OMPI users] MPI_Bcast issue

2010-08-07 Thread Randolph Pullen
I seem to be having a problem with MPI_Bcast. My massive I/O intensive data movement program must broadcast from n to n nodes. My problem starts because I require 2 processes per node, a sender and a receiver and I have implemented these using MPI processes rather than tackle the complexities

Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun

2010-06-23 Thread Randolph Pullen
etter handle such situations, but isn't ready for release yet. I'm not sure if I'll have time to go back to the 1.4 series and resolve this behavior, but I'll put it on my list of things to look at if/when time permits. On Jun 23, 2010, at 6:53 AM, Randolph Pullen wrote: ok, Having confirmed that r

Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun

2010-06-23 Thread Randolph Pullen
ous research work is ongoing to improve fault tolerance in Open MPI, but I don't know the state of it in terms of surviving a failed process.  I *think* that this kind of stuff is not ready for prime time, but I admit that this is not an area that I pay close attention to. On Jun 23, 2010, at 3:

Re: [OMPI users] more Bugs in MPI_Abort() -- mpirun

2010-06-23 Thread Randolph Pullen
ines, and timeout if not all are present within some alloted time? On Tue, Jun 22, 2010 at 10:43 PM, Randolph Pullen <randolph_pul...@yahoo.com.au> wrote: I have a mpi program that aggregates data from multiple sql systems.  It all runs fine.  To test fault tolerance I switch one of the

[OMPI users] more Bugs in MPI_Abort() -- mpirun

2010-06-23 Thread Randolph Pullen
I have a mpi program that aggregates data from multiple sql systems.  It all runs fine.  To test fault tolerance I switch one of the machines off while it is running.  The result is always a hang, ie mpirun never completes.   To try and avoid this I have replaced the send and receive calls with