>>
>>>>>>>> Does it mean that the process with rank 0 should be bound to
>>>>>>>> core 0, 1, or 2 of socket 1?
>>>>>>>>
>>>>>>>> I tried to use a rankfile and have a problem. My rankfile contai
On 09/07/2012 08:02 AM, Jeff Squyres wrote:
On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote:
Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is it
always the same node that crashes? And so on.
Another thought on hardware errors... I actually have seen bad RAM
On 09/03/2012 04:39 PM, Andrea Negri wrote:
max locked memory (kbytes, -l) 32
max memory size(kbytes, -m) unlimited
open files (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote:
> Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is
> it always the same node that crashes? And so on.
Another thought on hardware errors... I actually have seen bad RAM cause
spontaneous reboots with no Linux
On Sep 5, 2012, at 3:59 AM, Andrea Negri wrote:
> I have tried with these flags (I use gcc 4.7 and open mpi 1.6), but
> the program doesn't crash, a node go down and the rest of them remain
> to wait a signal (there is an ALLREDUCE in the code).
>
> Anyway, yesterday some processes died (without
George,
I hace done some modifications to the code, however this is the first
part my zmp_list:
!ZEUSMP2 CONFIGURATION FILE
LGEOM= 2,
LDIMEN = 2 /
LRAD = 0,
XHYDRO = .TRUE.,
XFORCE = .TRUE.,
XMHD = .false.,
Andrea,
As suggested by the previous answers I guess the size of your problem is too
large for the memory available on the nodes. I can runs ZeusMP without any
issues up to 64 processes, both over Ethernet and Infiniband. I tried the 1.6
and the current trunk, and both perform as expected.
I have tried with these flags (I use gcc 4.7 and open mpi 1.6), but
the program doesn't crash, a node go down and the rest of them remain
to wait a signal (there is an ALLREDUCE in the code).
Anyway, yesterday some processes died (without a log) on the node 10,
I logged almost immediately in the
----
Message: 1
Date: Fri, 31 Aug 2012 20:11:41 -0400
From: Gus Correa<g...@ldeo.columbia.edu>
Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster
of servers
To: Open MPI Users<us...@open
------------
>>
>> all the time.
>>
>> ==
>>
>> I configured with:
>>
>> ./configure --prefix=$HOME/local/... --enable-static --disable-shared
>> --with-sge
>>
>> and adjusted my PATHs accord
users-ow...@open-mpi.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>> 1. Re: some mpi processes "dis
s)
>> 2. Re: users Digest, Vol 2339, Issue 5 (Andrea Negri)
>>
>>
>> ------------------
>>
>> Message: 1
>> Date: Sat, 1 Sep 2012 08:48:56 +0100
>> From: John Hearns <hear...@googlemail.com>
-
>
> Message: 1
> Date: Sat, 1 Sep 2012 08:48:56 +0100
> From: John Hearns <hear...@googlemail.com>
> Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster
> of servers
>
Apologies, I have not taken the time to read your comprehensive diagnostics!
As Gus says, this sounds like a memory problem.
My suspicion would be the kernel Out Of Memory (OOM) killer.
Log into those nodes (or ask your systems manager to do this). Look
closely at /var/log/messages where there
Hi Andrea
I would guess this is a memory problem.
Do you know how much memory each node has?
Do you know the memory that
each MPI process in the CFD code requires?
If the program starts swapping/paging into disk, because of
low memory, those interesting things that you described can happen.
I
Hi, I have been in trouble for a year.
I run a pure MPI (no openMP) Fortran fluid dynamical code on a cluster
of server, and I obtain a strange behaviour by running the code on
multiple nodes.
The cluster is formed by 16 pc (1 pc is a node) with a dual core processor.
Basically, I'm able to run
16 matches
Mail list logo