[OMPI users] Use of -mca pml csum

2010-12-14 Thread Gilbert Grosdidier
Bonjour, Since I'm very suspicious about the condition of the IB network on my cluster, I'm trying to use the csum pml feature of OMPI (1.4.3). But I have a question: what happens if the Checksum is different on both ends ? Is there a warning printed, a flag set by the MPI_(I)recv or

[OMPI users] Spawning with the ompi-server option

2010-12-14 Thread Suraj Prabhakaran
Hello, Is there anyway to spawn processes with the ompi-server option? I need the child processes to open and publish ports for which I require this option. Is there an alternative? Thanks, Suraj Prabhakaran

Re: [OMPI users] Spawning with the ompi-server option

2010-12-14 Thread Ralph Castain
Not sure I fully understand the question. If you provide the --ompi-server option to mpirun, that info will be passed along to all processes, including those launched via comm_spawn, so they can subsequently connect to the server. On Dec 14, 2010, at 6:50 AM, Suraj Prabhakaran wrote: >

Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread Rolf vandeVaart
Hi James: I can reproduce the problem on a single node with Open MPI 1.5 and the trunk. I have submitted a ticket with the information. https://svn.open-mpi.org/trac/ompi/ticket/2656 Rolf On 12/13/10 18:44, James Dinan wrote: Hi, I'm getting strange behavior using datatypes in a one-sided

[OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Lydia Heck
About 9 months ago we had a new installation with a system of 1800 cores and at the time we found that jobs with more than 1028 cores would not start. At the time a colleague found that setting OMPI_MCA_plm_rsh_num_concurrent=256 help with the problem. We have now increased our processor

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-14 Thread Eugene Loh
David Mathog wrote: Is there a tool in openmpi that will reveal how much "spin time" the processes are using? I don't know what sort of answer is helpful for you, but I'll describe one option. With Oracle Message Passing Toolkit (formerly Sun ClusterTools, anyhow, an OMPI distribution

Re: [OMPI users] MPI_Bcast vs. per worker MPI_Send?

2010-12-14 Thread David Mathog
So the 2/2 consensus is to use the collective. That is straightforward for the send part of this, since all workers are sent the same data. For the receive I do not see how to use a collective. Each worker sends back a data structure, and the structures are of of varying size. This is almost

Re: [OMPI users] MPI_Bcast vs. per worker MPI_Send?

2010-12-14 Thread Eugene Loh
David Mathog wrote: For the receive I do not see how to use a collective. Each worker sends back a data structure, and the structures are of of varying size. This is almost always the case in Bioinformatics, where what is usually coming back from each worker is a count M of the number of

Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread James Dinan
Hi Rolf, Thanks for your help. I also noticed trouble with subarray data types. I attached the same test again, but with subarray rather than indexed types. It works correctly with MVAPICH on IB, but fails with OpenMPI 1.5 with the following message: $ mpiexec -n 2 ./a.out MPI RMA

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Lydia Heck
I have experimented a bit more and found that if I set OMPI_MCA_plm_rsh_num_concurrent=1024 a job with more than 2,500 processes will start and run. However when I searched the open-mpi web site for the the variable I could not find any indication. Best wishes, Lydia Heck 15. jobs

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread John Hearns
On 14 December 2010 17:32, Lydia Heck wrote: > > I have experimented a bit more and found that if I set > > OMPI_MCA_plm_rsh_num_concurrent=1024 > > a job with more than 2,500 processes will start and run. > > However when I searched the open-mpi web site for the the

Re: [OMPI users] Use of -mca pml csum

2010-12-14 Thread George Bosilca
IF the checksum on both peers doesn't match, your MPI call will return with an error. This is in addition of Open MPI printing a warning message on the output (which can be silenced with the right mca parameter). So, you're supposed to check the return values, and abort if something fishy is

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Gilbert Grosdidier
Bonjour Ralph, I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for rsh use, or for ssh OR rsh, depending on plm_rsh_agent, please ? Thanks, Best, G. Le 14/12/2010 18:30, Ralph Castain a écrit : That's a big cluster to be starting with rsh! :-) When you say it won't

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Ralph Castain
It applies to both. In the rsh/ssh launcher, there is a limit on how many concurrent ssh/rsh sessions we have open at any one time. This is required due to OS limitations. As each daemon completes its launch, it "daemonizes" and closes the ssh/rsh session, thus enabling another daemon to be

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-14 Thread Jeff Squyres
On Dec 10, 2010, at 11:00 AM, Prentice Bisbal wrote: >> Would it make sense to implement this as an MPI extension, and then >> perhaps propose something to the Forum for this purpose? > > I think that makes sense. As core and socket counts go up, I imagine the need > for this information will

Re: [OMPI users] meaning of MPI_THREAD_*

2010-12-14 Thread Jeff Squyres
On Dec 6, 2010, at 9:26 AM, Hicham Mouline wrote: > Thanks, it is now clarified that a call to MPI_INIT has the same effect as a > call to MPI_INIT_THREAD with > a required = MPI_THREAD_SINGLE. Perhaps it should be added here: > http://www.open-mpi.org/doc/v1.4/man3/MPI_Init_thread.3.php > as