Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ralph Castain
Well, it would - except then -all- the procs would run real slow! :-) Still, might be a reasonable diagnostic step to try...will give it a shot. On Wed, Jun 10, 2009 at 1:12 PM, Bogdan Costescu < bogdan.coste...@iwr.uni-heidelberg.de> wrote: > On Wed, 10 Jun 2009, Ralph Castain wrote: > > I app

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Bogdan Costescu
On Wed, 10 Jun 2009, Ralph Castain wrote: I appreciate the input and have captured it in the ticket. Since this appears to be a NUMA-related issue, the lack of NUMA support in your setup makes the test difficult to interpret. Based on this reasoning, disabling libnuma support in your OpenMPI

[OMPI devel] padb and orte

2009-06-10 Thread Ashley Pittman
All, As mentioned in another thread I've recently ported padb, a command line job inspection tool (kinda like a parallel debugger) to orte and OpenMPI. Padb is an existing stable product which has worked for a number of years on Slurm and RMS, orte support is new and not widely tested yet althou

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ralph Castain
Much appreciated! Per some of my other comments on this thread and on the referenced ticket, can you tell me what kernel you have on that machine? I assume you have NUMA support enabled, given that chipset? Thanks! Ralph On Wed, Jun 10, 2009 at 10:29 AM, Sylvain Jeaugey wrote: > Hum, very glad

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ralph Castain
I appreciate the input and have captured it in the ticket. Since this appears to be a NUMA-related issue, the lack of NUMA support in your setup makes the test difficult to interpret. I agree, though, that this is likely something peculiar to our particular setup. Of primary concern is that it mig

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Bogdan Costescu
On Wed, 10 Jun 2009, Ralph Castain wrote: Meantime, I have filed a bunch of data on this in ticket #1944, so perhaps you might take a glance at that and offer some thoughts? https://svn.open-mpi.org/trac/ompi/ticket/1944 I wasn't able to reproduce this. I have run with the following setup: -

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Sylvain Jeaugey
Hum, very glad that padb works with Open MPI, I couldn't live without it. In my opinion, the best debug tool for parallel applications, and more importantly, the only one that scales. About the issue, I couldn't reproduce it on my platform (tried 2 nodes with 2 to 8 processes each, nodes are t

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ashley Pittman
On Wed, 2009-06-10 at 09:07 -0600, Ralph Castain wrote: > Hi Ashley > > Thanks! I would definitely be interested and will look at the tool. Great. My plan was to introduce the tool to this list today or tomorrow anyway but this problem falls right it's it's target area so I brought it up early.

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ralph Castain
Hi Ashley Thanks! I would definitely be interested and will look at the tool. Meantime, I have filed a bunch of data on this in ticket #1944, so perhaps you might take a glance at that and offer some thoughts? https://svn.open-mpi.org/trac/ompi/ticket/1944 Will be back after I look at the tool.

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-10 Thread Ashley Pittman
Ralph, If I may say this is exactly the type of problem the tool I have been working on recently aims to help with and I'd be happy to help you through it. Firstly I'd say of the three collectives you mention, MPI_Allgather, MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a many-to-one

[OMPI devel] Does open MPI support nodes behind NAT or Firewall

2009-06-10 Thread Anjin Pradhan
Hi Everyone, I wanted to know whether OPENMPI supported nodes that are behind a NAT or a firewall. If it doesn't do this by default can anyone let me know how i should go about making openMPI support NAT and firewall. LEO Explore and discover exciting holidays and getaways with Yahoo

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-10 Thread Sylvain Jeaugey
Hi Jeff, Thanks for jumping in. On Tue, 9 Jun 2009, Jeff Squyres wrote: 2. Note that your solution presupposes that one MPI process can detect that the entire job is deadlocked. This is not quite correct. What exactly do you want to detect -- that one process may be imbalanced on its receiv