[OMPI users] Unable to find the following executable

2010-11-17 Thread Tushar Andriyas
Hi there, I am new to using mpi commands and was stuck in problem with running a code. When I submit my job through a batch file, the job exits with the message that the executable could not be found on the machines. I have tried a lot of options such as PBS -V and so on on but the problem

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
More than OGE uses external bindings. We have tested it using some tricks, and in environments where binding is available from the RM (e.g., slurm). So we know the basic code works. Whether or not it works with OGE is another matter. On Wed, Nov 17, 2010 at 9:09 AM, Terry Dontje

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Gus Correa
Ricardo Reis wrote: On Wed, 17 Nov 2010, Gus Correa wrote: For what is worth, the MPI addresses (a.k.a. pointers) in the Fortran bindings are integers, of standard size 4 bytes, IIRR. Take a look at mpif.h, mpi.h and their cousins to make sure. Unlike the Fortran FFTW "plans", you don't

Re: [OMPI users] Problem with sending messages from one of the machines

2010-11-17 Thread Grzegorz Maj
2010/11/11 Jeff Squyres : > On Nov 11, 2010, at 3:23 PM, Krzysztof Zarzycki wrote: > >> No, unfortunately specification of interfaces is a little more >> complicated...  eth0/1/2 is not common for both machines. > > Can you define "common"?  Do you mean that eth0 on one

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/17/2010 10:48 AM, Ralph Castain wrote: No problem at all. I confess that I am lost in all the sometimes disjointed emails in this thread. Frankly, now that I search, I can't find it either! :-( I see one email that clearly shows the external binding report from mpirun, but not from any

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
No problem at all. I confess that I am lost in all the sometimes disjointed emails in this thread. Frankly, now that I search, I can't find it either! :-( I see one email that clearly shows the external binding report from mpirun, but not from any daemons. I see another email (after you asked if

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Ricardo Reis
On Wed, 17 Nov 2010, Gus Correa wrote: For what is worth, the MPI addresses (a.k.a. pointers) in the Fortran bindings are integers, of standard size 4 bytes, IIRR. Take a look at mpif.h, mpi.h and their cousins to make sure. Unlike the Fortran FFTW "plans", you don't declare MPI addresses as

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/17/2010 10:00 AM, Ralph Castain wrote: --leave-session-attached is always required if you want to see output from the daemons. Otherwise, the launcher closes the ssh session (or qrsh session, in this case) as part of its normal operating procedure, thus terminating the stdout/err

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
--leave-session-attached is always required if you want to see output from the daemons. Otherwise, the launcher closes the ssh session (or qrsh session, in this case) as part of its normal operating procedure, thus terminating the stdout/err channel. On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Gus Correa
Ricardo Reis wrote: On Wed, 17 Nov 2010, Pascal Deveze wrote: I think the limit for a write (and also for a read) is 2^31-1 (2G-1). In a C program, after this value, an integer becomes negative. I suppose this is also true in Fortran. The solution, is to make a loop of writes (reads) of no

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/17/2010 09:32 AM, Ralph Castain wrote: Cris' output is coming solely from the HNP, which is correct given the way things were executed. My comment was from another email where he did what I asked, which was to include the flags: --report-bindings --leave-session-attached so we could

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
Cris' output is coming solely from the HNP, which is correct given the way things were executed. My comment was from another email where he did what I asked, which was to include the flags: --report-bindings --leave-session-attached so we could see the output from each orted. In that email, it

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/17/2010 07:41 AM, Chris Jewell wrote: On 17 Nov 2010, at 11:56, Terry Dontje wrote: You are absolutely correct, Terry, and the 1.4 release series does include the proper code. The point here, though, is that SGE binds the orted to a single core, even though other cores are also

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Chris Jewell
On 17 Nov 2010, at 11:56, Terry Dontje wrote: >> >> You are absolutely correct, Terry, and the 1.4 release series does include >> the proper code. The point here, though, is that SGE binds the orted to a >> single core, even though other cores are also allocated. So the orted >> detects an

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Ricardo Reis
On Wed, 17 Nov 2010, Pascal Deveze wrote: This is due to the interface defined for MPI_File_write that specifies an integer for the length. The positive value of an integer are coded in hexadecimal from to 7FFF FFF and negative values are coded from 8000 to . (7FFF

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/16/2010 08:24 PM, Ralph Castain wrote: On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje > wrote: On 11/16/2010 01:31 PM, Reuti wrote: Hi Ralph, Am 16.11.2010 um 15:40 schrieb Ralph Castain: 2. have SGE bind

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Pascal Deveze
This is due to the interface defined for MPI_File_write that specifies an integer for the length. The positive value of an integer are coded in hexadecimal from to 7FFF FFF and negative values are coded from 8000 to . (7FFF is exactly 2^31-1). Pascal Ricardo Reis

[OMPI users] out of memory in io_romio_ad_nfs_read.c

2010-11-17 Thread Zak
Dear I m getting the following error, during the I/O "out of memory in io_romio_ad_nfs_read.c, line 156" do any one knew how I solve this issue during the read of file Best regards zak

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Daniel Gruber
Hi, I'm interested in what is expected from OGE/SGE in order to support most of your scenarios. First of all the "-binding pe" request is not flexible and makes only sense in scenarios when having the same architecture on each host, each involved host is used exclusively for the job (SGE

Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-17 Thread Ricardo Reis
On Tue, 16 Nov 2010, Gus Correa wrote: Ricardo Reis wrote: and sorry to be such a nuisance... but any motive for an MPI-IO "wall" between the 2.0 and 2.1 Gb? Salve Ricardo Reis! Is this "wall" perhaps the 2GB Linux file size limit on 32-bit systems? No. This is a 64bit machine and if