Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe
On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: > > On Jul 21, 2010, at 7:44 AM, Philippe wrote: > >> Ralph, >> >> Sorry for the late reply -- I was away on vacation. > > no problem at all! > >> >> regarding your earlier question about how many processes where >> involved when the memory wa

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-22 Thread David Ronis
That did it. Thanks. David On Wed, 2010-07-21 at 15:29 -0500, Dave Goodell wrote: > On Jul 21, 2010, at 2:54 PM CDT, Jed Brown wrote: > > > On Wed, 21 Jul 2010 15:20:24 -0400, David Ronis > > wrote: > >> Hi Jed, > >> > >> Thanks for the reply and suggestion. I tried adding -mca > >> yield_w

Re: [OMPI users] Question on checkpoint overhead in Open MPI

2010-07-22 Thread Nguyen Toan
Dear Josh, Thank you very much for the reply. I am sorry if my question was unclear, so please let me organize my question again. Currently I am applying the staging technique with the mca-params.conf setting as follows: snapc_base_store_in_place=0 # enable remote file transfer to global storage c

Re: [OMPI users] How to checkpoint atomic function in OpenMPI

2010-07-22 Thread Nguyen Toan
Dear Josh, I hope to see this new API soon. Anyway, I will try these critical section functions in BLCR. Thank you for the support. Best Regards, Nguyen Toan On Sat, Jul 17, 2010 at 6:34 AM, Josh Hursey wrote: > > On Jun 14, 2010, at 5:26 AM, Nguyen Toan wrote: > > > Hi all, > > I have a MPI pr

[OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
Hello, i am designing a solution to one of my programs, which mixes some tree generation, matrix operatons, eigenvaluies, among other tasks. i have to paralellize all of this for a cluster of 4 nodes (32 cores), and what i first thought was MPI as a blind choice, but after looking at this picture

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Gus Correa
Hi Cristobal You may want to take a look at PETSc, which has all the machinery for linear algebra that you need, can easily attach a variety of Linear Algebra packages, including those in the diagram you sent and more, builds on top of MPI, and can even build MPI for you, if you prefer. It has C

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain
It was easier for me to just construct this module than to explain how to do so :-) I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it. Here are the envars you'll need to provide: Each process n

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
Thanks im looking at the manual, seems good. i think now the picture is more clear. i have a very custom algorithm, local problem of research, paralelizable, thats where openMPI enters. then, at some point on the program, all the computation traduces to numeric (double) matrix operations, eigenva

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe
Ralph, Thank you so much!! I'll give it a try and let you know. I know it's a tough question, but how stable is the dev trunk? Can I just grab the latest and run, or am I better off taking your changes and copy them back in a stable release? (if so, which one? 1.4? 1.5?) p. On Thu, Jul 22, 201

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread amjad ali
Hi Cristobal, Note that the pic in http://dl.dropbox.com/u/6380744/clusterLibs.png shows that Scalapack is based on what; it only shows which packages Scalapack uses; hence no OpenMP is there. Also be clear about the difference: "OpenMP" is for shared memory parallel programming, while "OpenMPI"

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
yes, i was aware of the big difference hehe. now that openMP and openMPI is in talk, i've alwyas wondered if its a good idea to model a solution on the following way, using both openMP and openMPI. suppose you have n nodes, each node has a quadcore, (so you have n*4 processors) launch n proceses a

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain
Dev trunk looks okay right now - I think you'll be fine using it. My new component -might- work with 1.5, but probably not with 1.4. I haven't checked either of them. Anything at r23478 or above will have the new module. Let me know how it works for you. I haven't tested it myself, but am prett

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread amjad ali
its possible. but Not a novel idea. hehe. Its a form of HYBRID programming (distributed shared programming). But it needs to be ensured that whether it is beneficial for a given case/problem/code. On Thu, Jul 22, 2010 at 5:52 PM, Cristobal Navarro wrote: > yes, > i was aware of the big differen

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Gus Correa
Hi Cristobal Cristobal Navarro wrote: yes, i was aware of the big difference hehe. now that openMP and openMPI is in talk, i've alwyas wondered if its a good idea to model a solution on the following way, using both openMP and openMPI. suppose you have n nodes, each node has a quadcore, (so you

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
thanks very clear, i was not aware that openMPI internally uses shared memory in case two proceses reside on the same node, which is perfect. very complete explanations, thanks really On Thu, Jul 22, 2010 at 7:11 PM, Gus Correa wrote: > Hi Cristobal > > Cristobal Navarro wrote: >> >> yes, >> i

[OMPI users] OpenMPI killed by signal 9

2010-07-22 Thread Jack Bryan
Dear All: I run a parallel job on 6 nodes of an OpenMPI cluster. But I got error: rank 0 in job 82 system.cluster_37948 caused collective abort of all ranks exit status of rank 0: killed by signal 9 It seems that there is segmentation fault on node 0. But, if the program is run for a short

Re: [OMPI users] MPI process dies with a route error when usingdynamic process calls to connect more than 2 clients to aserver with InfiniBand

2010-07-22 Thread Jeff Squyres
It's worth noting that this new component will likely get pulled into 1.5.1 (we're refreshing a bunch of stuff in 1.5.1 -- this new component will be included in that refresh). No specific timeline on 1.5.1 yet, though. On Jul 22, 2010, at 5:53 PM, Ralph Castain wrote: > Dev trunk looks okay

Re: [OMPI users] OpenMPI killed by signal 9

2010-07-22 Thread Jeff Squyres
Signal 9 more than likely means that some external entity killed your MPI job (e.g., a resource manager determined that your process took too much time / CPU / whatever and killed it). That also makes sense since you say that short jobs complete with no problem, but (assumedly) longer jobs get