Checking your original post, you had this:

                                    call
MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD
                                    call
MPI_RECV(tonode, 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr)

There was already discussion of the 3 vs. 4 and we're assuming that toroot and tonode are large enough (note that I think you mean "array of size 3" and "array of size 4", not "3d" and "4d", right?).

I notice that the ierr argument is missing from the MPI_Send, above. Is that a typo? It could be, because the ) is missing, as well. But a missing ierr can be a common cause for a segv in Fortran -- we typically don't assign to ierr until after the MPI_Send completes, so it *could* explain the behavior you're seeing...?


On Sep 15, 2008, at 12:33 PM, Enrico Barausse wrote:

sorry, I should pay more attention when I edit the subject of the daily digest

Dear Eric, Aurelien and Eugene

thanks a lot for helping. What Eugene said summarizes exactly the
situation. I agree it's an issue with the full code, since the problem
doesn't arise in simple examples, like the one I posted. I was just
hoping I was doing something trivially wrong and that someone would
shout at me :-). I could post the full code, but it's quite a long
one. At the moment I am still going through it searching for the
problem, so I'll wait a bit before spamming the other users.

cheers

Enrico


On Mon, Sep 15, 2008 at 6:00 PM,  <users-requ...@open-mpi.org> wrote:
Send users mailing list submissions to
      us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
      http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
      users-requ...@open-mpi.org

You can reach the person managing the list at
      users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. Re: Problem using VampirTrace (Thomas Ropars)
 2. Re: Why compilig in global paths (only) for       configuretion
    files? (Paul Kapinos)
 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)


----------------------------------------------------------------------

Message: 1
Date: Mon, 15 Sep 2008 15:04:07 +0200
From: Thomas Ropars <trop...@irisa.fr>
Subject: Re: [OMPI users] Problem using VampirTrace
To: Andreas Kn?pfer <andreas.knuep...@tu-dresden.de>
Cc: us...@open-mpi.org
Message-ID: <48ce5d47.50...@irisa.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hello,

I don't have a common file system for all cluster nodes.

I've tried to run the application again with VT_UNIFY=no and to call
vtunify manually. It works well. I managed to get the .otf file.

Thank you.

Thomas Ropars


Andreas Kn?pfer wrote:
Hello Thomas,

sorry for the delay. My first asumption about the cause of your problem is the so called "unify" process. This is a post-processing step which is performed automatically after the trace run. This step needs read access to all files,
though. So, do you have a common file system for all cluster nodes?

If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will be copied there from the location VT_PFORM_LDIR which still can be a node-local directory. Then everything will be handled automatically.

If not, please set VT_UNIFY=no in order to disable automatic unification. Then you need to call vtunify manually. Please copy all files from the run directory that start with your OTF file prefix to a common directory and call

%> vtunify <number of processes> <file prefix>

there. This should give you the <prefix>.otf file.

Please give this a try. If it is not working, please give me an 'ls -alh' from
your trace directory/directories.

Best regards, Andreas


P.S.: Please have my email on CC, I'm not on the users@open- mpi.org list.




From: Thomas Ropars <trop...@irisa.fr>
Date: August 11, 2008 3:47:54 PM IST
To: us...@open-mpi.org
Subject: [OMPI users] Problem using VampirTrace
Reply-To: Open MPI Users <us...@open-mpi.org>

Hi all,

I'm trying to use VampirTrace.
I'm working with r19234 of svn trunk.

When I try to run a simple application with 4 processes on the same
computer, it works well.
But if try to use the same application with the 4 processes executed
on 4 different computers, I never get the .otf file.

I've tried to run with VT_VERBOSE=yes, and I get the following trace:

VampirTrace: Thread object #0 created, total number is 1
VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes]
VampirTrace: Thread object #0 created, total number is 1
VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes]
VampirTrace: Thread object #0 created, total number is 1
VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes]
VampirTrace: Thread object #0 created, total number is 1
VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes]
Ring : Start
Ring : End
[1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834aca.3040 id 1]
[2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834bca.3020 id 1]
[1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834aca.3040 id 1]
[3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834fca.3011 id 1]
[2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834bca.3020 id 1]
[0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe8349ca.3294 id 1]
[1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl
[2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl
[3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe834fca.3011 id 1]
[0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
vt.fffffffffe8349ca.3294 id 1]
[0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl
[0]VampirTrace: Checking for ./ring-vt.1.uctl ...
[0]VampirTrace: Checking for ./ring-vt.2.uctl ...
[1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
3040.1.def
[2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
3020.1.def
[3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl
[1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
3040.1.events
[2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
3020.1.events
[3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
3011.1.def
[1]VampirTrace: Thread object #0 deleted, leaving 0
[2]VampirTrace: Thread object #0 deleted, leaving 0
[3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
3011.1.events
[3]VampirTrace: Thread object #0 deleted, leaving 0


Regards

Thomas
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users








------------------------------

Message: 2
Date: Mon, 15 Sep 2008 17:22:03 +0200
From: Paul Kapinos <kapi...@rz.rwth-aachen.de>
Subject: Re: [OMPI users] Why compilig in global paths (only) for
      configuretion files?
To: Open MPI Users <us...@open-mpi.org>,        Samuel Sarholz
      <sarh...@rz.rwth-aachen.de>
Message-ID: <48ce7d9b.8070...@rz.rwth-aachen.de>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Hi Jeff, hi all!

Jeff Squyres wrote:
Short answer: yes, we do compile in the prefix path into OMPI. Check
out this FAQ entry; I think it'll solve your problem:

   http://www.open-mpi.org/faq/?category=building#installdirs


Yes, reading man pages helps!
Thank you to provide useful help.

But the setting of the environtemt variable OPAL_PREFIX to an
appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is
not enough to let the OpenMPI rock&roll from the new lokation.

Because of the fact, that all the files containing settings for
opal_wrapper, which are located in share/openmpi/ and called e.g.
mpif77-wrapper-data.txt, contain (defined by installation with -- prefix)
hard-coded paths, too.

I have fixed the problem by parsing all the files share/openmpi/ *.txt and replacing the old path through new path. This nasty solution seems
to work.

But, is there an elegant way to do this correctness, maybe to
re-generate the config-files in share/openmpi/

And last but not least, the FAQ on the web site you provided (see link
above) does not containn any info on the need to modufy the wrapper
configuretion files. Maybe this section schould be upgraded?

Best regards Paul Kapinos











On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote:

Hi all!

We are using OpenMPI on an variety of machines (running Linux,
Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun
Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions
of each release of OpenMPI (SUN Cluster Tools not included).

This shows, that we have to support an complete petting zoo of
OpenMPI's. Sometimes we may need to move things around.


If OpenMPI is being configured, the install path may be provided using
--prefix keyword, say so:

./configure --prefix=/my/love/path/for/openmpi/tmp1

After "gmake all install" in ...tmp1 an installation of OpenMPI may be
found.

Then, say, we need to *move* this Version to an another path, say
/my/love/path/for/openmpi/blupp

Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we
can that ;-)

And if we tried to use OpenMPI from new location, we got error message
like

$ ./mpicc
Cannot open configuration file
/my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper- data.txt
Error parsing data file mpicc: Not found

(note the old installation path used)

That looks for me, that the install path provided with --prefix in
configuration step, is compiled into opal_wrapper executable file and opal_wrapper works iff the set of configuration files is in this path. But after move of the OpenMP installation directory the configuration
files aren't there...

An side effect of this behaviour is the certainty that binary
distributions of OpenMPI (RPM's) are not relocatable. That's
uncomfortably. (Actually, this mail is initiated by the fact that Sun
ClusterTools RPM's are not relocatable)


So, does this behavior have an deeper sence I cannot recognise, or
maybe  the configuring of global paths is not needed?

What I mean, is that the paths for the configuration files, which
opal_wrapper need, may be setted locally like ../share/openmpi/***
without affectiong the integrity of OpenMPI. Maybe there were were
more places where the usage of local paths may be needed to allowe
movable (relocable) OpenMPI.

What do you mean about?

Best regards
Paul Kapinos



<kapinos.vcf>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



-------------- next part --------------
A non-text attachment was scrubbed...
Name: verwurschel_pfade_openmpi.sh
Type: application/x-sh
Size: 369 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kapinos.vcf
Type: text/x-vcard
Size: 330 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4230 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin >

------------------------------

Message: 3
Date: Mon, 15 Sep 2008 08:46:11 -0700
From: Eugene Loh <eugene....@sun.com>
Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ?
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <48ce8343.7060...@sun.com>
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Aur?lien Bouteiller wrote:

You can't assume that MPI_Send does buffering.

Yes, but I think this is what Eric meant by misinterpreting Enrico's
problem.  The communication pattern is to send a message, which is
received remotely. There is remote computation, and then data is sent
back.  No buffering is needed for such a pattern.  The code is
"apparently" legal. There is apparently something else going on in the
"real" code that is not captured in the example Enrico sent.

Further, if I understand correctly, the remote process actually receives
the data!  If  this is true, the example is as simple as:

process 1:
  MPI_Send()     // this call blocks

process 0:
  MPI_Recv()    // this call actually receives the data sent by
MPI_Send!!!

Enrico originally explained that process 0 actually receives the data.
So, MPI's internal buffering is presumably not a problem at all!  An
MPI_Send effectively sends data to a remote process, but simply never
returns control to the user program.

Without buffering, you  are in a possible deadlock situation. This
pathological case is the  exact motivation for the existence of
MPI_Sendrecv. You can also  consider Isend Recv Wait, then the Send
will never block, even if the destination is not ready to receive, or
MPI_Bsend that will add  explicit buffering and therefore return
control to you before the  message transmission actually begun.

Aurelien


Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit :

Sorry about that, I had misinterpreted your original post as being
the pair of send-receive. The example you give below does seem
correct indeed, which means you might have to show us the code that
doesn't work. Note that I am in no way a Fortran expert, I'm more
versed in C. The only hint I'd give a C programmer in this case is
"make sure your receiving structures are indeed large enough (ie:
you send 3d but eventually receive 4d...did you allocate for 3d or
4d for receiving the converted array...).

Eric

Enrico Barausse wrote:

sorry, I hadn't changed the subject. I'm reposting:

Hi

I think it's correct. what I want to to is to send a 3d array from the
process 1 to process 0 =root):
call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD

in some other part of the code process 0 acts on the 3d array and
turns it into a 4d one and sends it back to process 1, which receives
it with

call MPI_RECV(tonode,
4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr)

in practice, what I do i basically give by this simple code (which
doesn't give the segmentation fault unfortunately):



     a=(/1,2,3,4,5/)

     call MPI_INIT(ierr)
     call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr)
     call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)

     if(numprocs/=2) stop

     if(id==0) then
             do k=1,5
                     a=a+1
                     call MPI_SEND(a,5,MPI_INTEGER,
1,k,MPI_COMM_WORLD,ierr)
                     call
MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr)
             end do
     else
             do k=1,5
                     call
MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr)
                     b=a(1:4)
                     call MPI_SEND(b,4,MPI_INTEGER,
0,k,MPI_COMM_WORLD,ierr)
             end do
     end if
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
* Dr. Aur?lien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/user
s






------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1006, Issue 2
**************************************


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to