Re: [OMPI users] Connection to HNP lost

2007-07-10 Thread Ralph Castain
On 7/10/07 3:56 PM, "Glenn Carver" wrote: > Brian, Ralph, > > I neglected to mention in my first email that the application hasn't > completed when I see the "HNP lost" messages. All processes of the > pplication are still running on the nodes (well consuming cpu cycles > really). I should c

Re: [OMPI users] Connection to HNP lost

2007-07-10 Thread Glenn Carver
Brian, Ralph, I neglected to mention in my first email that the application hasn't completed when I see the "HNP lost" messages. All processes of the application are still running on the nodes (well consuming cpu cycles really). I should check to see if mpirun is still there. Further invest

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread George Bosilca
I did see this warnings a while ago. But, it didn't happened on the last few weeks. I prefer not to set it automatically, or have to add a new MCA parameter. That way, we cannot be blamed for anything. The first thing the users will notice, is a degradation of the performances. Then they co

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 3:24 PM, Tim Prins wrote: On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote: On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: Tim, starting with the recently released 1.2.1, it is the default. To clarify, MX_RCACHE=1 is the default. It would be good for the defau

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Tim Prins
On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote: > On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: > > Tim, starting with the recently released 1.2.1, it is the default. > > To clarify, MX_RCACHE=1 is the default. It would be good for the default to be something where there is no warning

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: Tim, starting with the recently released 1.2.1, it is the default. To clarify, MX_RCACHE=1 is the default. Scott

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
Tim, starting with the recently released 1.2.1, it is the default. George, do you see the warning or no? Scott On Jul 10, 2007, at 2:52 PM, Tim Prins wrote: Is this something that Open MPI should be setting automatically? Tim On Tuesday 10 July 2007 02:44:04 pm George Bosilca wrote: I alwa

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote: Scott - I'm having trouble getting the warning to go away with Open MPI. I've disabled our copy of ptmalloc2, so we're not providing a malloc anymore. I'm wondering if there's also something with the use of DSOs to load libmyriexpress? Is your

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Tim Prins
Is this something that Open MPI should be setting automatically? Tim On Tuesday 10 July 2007 02:44:04 pm George Bosilca wrote: > I always use MX_RCACHE=2 for both MTL and BTL. So far I didn't had > any problems with it. > >george. > > On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote: > > On J

Re: [OMPI users] Connection to HNP lost

2007-07-10 Thread Brian Barrett
What Ralph said is generally true. If your application completed, this is nothing to worry about. It means that an error occurred on the socket between mpirun ad some other process. However, combind with the travor0 errors in the log files, it could mean that your IPoIB network is acting

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread George Bosilca
I always use MX_RCACHE=2 for both MTL and BTL. So far I didn't had any problems with it. george. On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote: On Jul 10, 2007, at 11:40 AM, Scott Atchley wrote: On Jul 10, 2007, at 1:14 PM, Christopher D. Maestas wrote: Has anyone seen the following

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Brian Barrett
On Jul 10, 2007, at 11:40 AM, Scott Atchley wrote: On Jul 10, 2007, at 1:14 PM, Christopher D. Maestas wrote: Has anyone seen the following message with Open MPI: --- warning:regcache incompatible with malloc --- --- We don't see this message with mpich-mx-1.2.7..4 MX has an internal reg

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 1:14 PM, Christopher D. Maestas wrote: Has anyone seen the following message with Open MPI: --- warning:regcache incompatible with malloc --- --- We don't see this message with mpich-mx-1.2.7..4 Hi Chris, MX has an internal registration cache that can be enabled with

Re: [OMPI users] Connection to HNP lost

2007-07-10 Thread Ralph H Castain
On 7/10/07 11:08 AM, "Glenn Carver" wrote: > Hi, > > I'd be grateful if someone could explain the meaning of this error > message to me and whether it indicates a hardware problem or > application software issue: > > [node2:11881] OOB: Connection to HNP lost > [node1:09876] OOB: Connection t

[OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Christopher D. Maestas
Has anyone seen the following message with Open MPI: --- warning:regcache incompatible with malloc --- We see this with 1.2.3 on a 32 bit mx build: --- $ cat BUILD_ENV # Build Environment: USE="doc icc modules mx torque" COMPILER="intel-9.1-f040-c046" CC="icc" CXX="icpc" CLINKER="icc" FC="ifort"

[OMPI users] Connection to HNP lost

2007-07-10 Thread Glenn Carver
Hi, I'd be grateful if someone could explain the meaning of this error message to me and whether it indicates a hardware problem or application software issue: [node2:11881] OOB: Connection to HNP lost [node1:09876] OOB: Connection to HNP lost I have a small cluster which until last week was

Re: [OMPI users] mpi with icc, icpc and ifort :: segfault (Jeff Squyres)

2007-07-10 Thread Jeff Squyres
Whoa -- if you are failing here, something is definitely wrong: this is failing when accessing stack memory! Are you able to compile/run other trivial and non-trivial C++ applications using your Intel compiler installation? On Jul 10, 2007, at 12:10 PM, Ricardo Reis wrote: On Mon, 9 Jul

Re: [OMPI users] DataTypes with "holes" for writing files

2007-07-10 Thread jody
I think there is still some problem. I create different datatypes by resizing MPI_SHORT with different negative lower bounds (depending on the rank) and the same extent (only depending on the number of processes). However, I get an error as soon as MPI_File_set_view is called with my new datatyp

Re: [OMPI users] Open MPI 1.2.3 spec file

2007-07-10 Thread Alex Tumanov
On 7/9/07, Jeff Squyres wrote: On Jul 6, 2007, at 12:05 PM, Alex Tumanov wrote: > Eureka! I managed to get it working despite the incorrect _initial_ > ./configure invocation. For those interested, here are my compilation > options: > # cat ompi_build.sh > #!/bin/sh > > rpmbuild --rebuild -D "

Re: [OMPI users] mpi with icc, icpc and ifort :: segfault (Jeff Squyres)

2007-07-10 Thread Ricardo Reis
On Mon, 9 Jul 2007, Jeff Squyres wrote: Ok, that unfortunately doesn't make much sense -- I don't know what opal_event_set() inside opal_event_init() would cause a segv. Can you recompile OMPI with -g and re-run this test? The "where" information from gdb will then give us more information.

Re: [OMPI users] DataTypes with "holes" for writing files

2007-07-10 Thread George Bosilca
MPI_LB and MPI_UB is what you're looking for. Or better, for MPI-2 compliant libraries such as Open MPI and MPICH2, you can use MPI_Type_create_resized. This will allow you to create the gap at the beginning and/or the end of a data-type description. george. On Jul 10, 2007, at 10:53 AM,

[OMPI users] DataTypes with "holes" for writing files

2007-07-10 Thread jody
hi I want to create datatypes of the form XX00... 00XX... XX00... etc. I tried MPI_Type_indexed(1, ais, ait, MPI_SHORT, &dtNewType) where ais= {2} and ait = {2} but this only gives me a datatype of the form 00XX, i.e. no holes at the end. I guess MPI_Type_vector won't work, because t

Re: [OMPI users] openmpi fails on mx endpoint busy

2007-07-10 Thread Tim Prins
SLIM H.A. wrote: Dear Tim So, you should just be able to run: mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile ompi_machinefile ./cpi I tried node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile ompi_machinefile ./cpi I put in a sleep call to keep it running for some

Re: [OMPI users] openmpi fails on mx endpoint busy

2007-07-10 Thread SLIM H.A.
Dear Tim > So, you should just be able to run: > mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile > ompi_machinefile ./cpi I tried node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile ompi_machinefile ./cpi I put in a sleep call to keep it running for some time and to mo