Truly am sorry about that - we were just talking today about the need to update
and improve our FAQ on running on large clusters. Did you by any chance look at
it? Would appreciate any thoughts on how it should be improved from a user's
perspective.
On Sep 20, 2011, at 3:28 PM, Henderson, Bre
Nope, but if I didn't that would have saved me about an hour of coding time!
I'm still curious if it would be beneficial to inject some barriers at certain
locations so that if you had a slow node, not everyone would end up connecting
to it all at once. Anyway, if I get access to another large
Hmmmperhaps you didn't notice the mpi_preconnect_all option? It does
precisely what you described - it pushes zero-byte messages around a ring to
force all the connections open at MPI_Init.
On Sep 20, 2011, at 3:06 PM, Henderson, Brent wrote:
> I recently had access to a 200+ node Magny Co
I recently had access to a 200+ node Magny Cours (24 ranks/host) 10G Linux
cluster. I was able to use OpenMPI v1.5.4 with hello world, IMB and HPCC, but
there were a couple of issues along the way. After setting some system
tunables up a little bit on all of the nodes a hello_world program wor
Follow-up #1: I tried using the autogen.sh script referenced here
https://svn.open-mpi.org/trac/ompi/changeset/22274
but that did not resolve the build problem.
Follow-up #2: configuring with --disable-mpi-cxx does allow the compilation to
succeed. Perhaps that's obvious, but I had to check.
I'm having trouble building 1.4.3 using PGI 10.9. I searched the list archives
briefly but I didn't stumble across anything that looked like the same problem,
so I thought I'd ask if an expert might recognize the nature of the problem
here.
The configure command:
./configure --prefix=/release
Here is a diff -y output of the compilation of one of the program's files. The
one on the left is OpenMPI mpif90, the one on the right is MVAPICH mpif90.
Does that suggest perhaps I should try adding -fPIC to the OpenMPI-linked
compilation?
/appserv/intel/Compiler/11.1/072/bin/intel64/fortcom
Thank you for this explanation. I will assume that my problem here is some
kind of memory corruption.
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: us...@open-mpi.org
Subjec
On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:
It appears to be a side effect of linkage that is able to change a compute-only
routine's answers.
I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of
corruption may be going on.
Those intrinsics have direct instruct
Am 20.09.2011 um 16:50 schrieb Blosch, Edwin L:
> Thank you all for the replies.
>
> Certainly optimization flags can be useful to address differences between
> compilers, etc. And differences in MPI_ALLREDUCE are appreciated as possible.
> But I don't think either is quite relevant because:
>
Ole Nielsen wrote:
Thanks for your suggestion Gus, we need a way of debugging what is going
on. I am pretty sure the problem lies with our cluster configuration. I
know MPI simply relies on the underlying network. However, we can ping
and ssh to all nodes (and in between and pair as well) so it
I've not been following closely. How do you know you're using the
identical compilation flags? Are you saying you specify the same flags
to "mpicc" (or whatever) or are you confirming that the back-end
compiler is seeing the same flags? The MPI compiler wrapper (mpicc, et
al.) can add flags.
Thank you all for the replies.
Certainly optimization flags can be useful to address differences between
compilers, etc. And differences in MPI_ALLREDUCE are appreciated as possible.
But I don't think either is quite relevant because:
- It was exact same compiler, with identical compilation fl
Hi,
Maybe you can leverage some of the techniques outlined in:
Robert W. Robey, Jonathan M. Robey, and Rob Aulwes. 2011. In search of
numerical consistency in parallel programming. Parallel Comput. 37, 4-5 (April
2011), 217-229. DOI=10.1016/j.parco.2011.02.009
http://dx.doi.org/10.1016/j.parco
The problem you're running into is not due to Open MPI. The Objective C and C
compilers on OS X (and most platforms) are the same binary, so you should be
able to use mpicc without any problems. It will see the .m extension and
switch to Objective C mode. However, NSLog is in the Foundation f
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
>happen until the third iteration. I take that to mean that the basic
>communication works, but that something is saturating. Is there some notion
>of buffer size somewhere in the MPI system that could explain this?
>
Am 20.09.2011 um 13:52 schrieb Tim Prince:
> On 9/20/2011 7:25 AM, Reuti wrote:
>> Hi,
>>
>> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
>>
>>> I am observing differences in floating-point results from an application
>>> program that appear to be related to whether I link with OpenMPI 1.4.3
Hi Sébastien,
If I understand you correctly, you are running your application on two
different MPIs on two different clusters with two different IB vendors.
Could you make a comparison more "apples to apples"-ish?
For instance:
- run the same version of Open MPI on both clusters
- run the same
On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:
> Hi all - and sorry for the multiple postings, but I have more information.
+1 on Eugene's comments. The test program looks fine to me.
FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler
allows you to just:
mpicc m
On Sep 20, 2011, at 7:52 AM, Tim Prince wrote:
> Quoted comment from OP seem to show a somewhat different question: Does
> OpenMPI implement any operations in a different way from MVAPICH? I would
> think it probable that the answer could be affirmative for operations such as
> allreduce, but
On 9/20/2011 7:25 AM, Reuti wrote:
Hi,
Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
I am observing differences in floating-point results from an application
program that appear to be related to whether I link with OpenMPI 1.4.3 or
MVAPICH 1.2.0. Both packages were built with the same ins
Hi,
Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
> I am observing differences in floating-point results from an application
> program that appear to be related to whether I link with OpenMPI 1.4.3 or
> MVAPICH 1.2.0. Both packages were built with the same installation of Intel
> 11.1, as w
22 matches
Mail list logo