On Feb 8, 2011, at 8:21 PM, Ralph Castain wrote:
I would personally suggest not reconfiguring your system simply to
support a particular version of OMPI. The only difference between
the 1.4 and 1.5 series wrt slurm is that we changed a few things to
support a more recent version of slurm. I
I would personally suggest not reconfiguring your system simply to support a
particular version of OMPI. The only difference between the 1.4 and 1.5 series
wrt slurm is that we changed a few things to support a more recent version of
slurm. It is relatively easy to backport that code to the 1.4
On 09/02/2011, at 9:16 AM, Ralph Castain wrote:
> See below
>
>
> On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote:
>
>>
>> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:
>>
>>> Hi Michael,
>>>
>>> You may have tried to send some debug information to the list, but it
>>> appears to
See below
On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote:
>
> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:
>
>> Hi Michael,
>>
>> You may have tried to send some debug information to the list, but it
>> appears to have been blocked. Compressed text output of the backtrace text
>
On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:
> Hi Michael,
>
> You may have tried to send some debug information to the list, but it appears
> to have been blocked. Compressed text output of the backtrace text is
> sufficient.
Odd, I thought I sent it to you directly. In any case,
On 09/02/2011, at 2:38 AM, Ralph Castain wrote:
> Another possibility to check - are you sure you are getting the same OMPI
> version on the backend nodes? When I see it work on local node, but fail
> multi-node, the most common problem is that you are picking up a different
> OMPI version due
Another possibility to check - are you sure you are getting the same OMPI
version on the backend nodes? When I see it work on local node, but fail
multi-node, the most common problem is that you are picking up a different OMPI
version due to path differences on the backend nodes.
On Feb 8, 201
Hi Michael,
You may have tried to send some debug information to the list, but it
appears to have been blocked. Compressed text output of the backtrace
text is sufficient.
Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Feb 7, 2011, at 8:38 AM, Samuel K. Gutierrez wrote:
Hi,
A detailed backtrace from a core dump may help us debug this. Would
you be willing to provide that information for us?
Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Feb 6, 2011, at 6:36 PM, Michael Curtis wrote:
On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote
The 1.4 series is regularly tested on slurm machines after every modification,
and has been running at LANL (and other slurm installations) for quite some
time, so I doubt that's the core issue. Likewise, nothing in the system depends
upon the FQDN (or anything regarding hostname) - it's just us
On 07/02/2011, at 12:36 PM, Michael Curtis wrote:
>
> On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:
>
> Hi,
>
>> I just tried to reproduce the problem that you are experiencing and was
>> unable to.
>>
>> SLURM 2.1.15
>> Open MPI 1.4.3 configured with:
>> --with-platform=./contrib/p
On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:
Hi,
> I just tried to reproduce the problem that you are experiencing and was
> unable to.
>
> SLURM 2.1.15
> Open MPI 1.4.3 configured with:
> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas
I compiled OpenMPI 1.4.3 (vanilla
On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:
> I just tried to reproduce the problem that you are experiencing and was
> unable to.
>
>
> SLURM 2.1.15
> Open MPI 1.4.3 configured with:
> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas
>
> I'll dig a bit further.
Intere
Hi,
I just tried to reproduce the problem that you are experiencing and
was unable to.
[samuel@lo1-fe ~]$ salloc -n32 mpirun --display-map ./mpi_app
salloc: Job is in held state, pending scheduler release
salloc: Pending job allocation 138319
salloc: job 138319 queued and waiting for resource
Hi,
We'll try to reproduce the problem.
Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Feb 2, 2011, at 2:55 AM, Michael Curtis wrote:
On 28/01/2011, at 8:16 PM, Michael Curtis wrote:
On 27/01/2011, at 4:51 PM, Michael Curtis wrote:
Some more debugging information:
Is
On 28/01/2011, at 8:16 PM, Michael Curtis wrote:
>
> On 27/01/2011, at 4:51 PM, Michael Curtis wrote:
>
> Some more debugging information:
Is anyone able to help with this problem? As far as I can tell it's a
stock-standard recently installed SLURM installation.
I can try 1.5.1 but hesitant
On 27/01/2011, at 4:51 PM, Michael Curtis wrote:
Some more debugging information:
> Failing case:
> michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi
> JOB MAP
Backtrace with debugging symbols
#0 0x77bb5c1e in ?? () from /usr/li
Hi,
I'm not sure whether this problem is with SLURM or OpenMPI, but the stack
traces (below) point to an issue within OpenMPI.
Whenever I try to launch an MPI job within SLURM, mpirun immediately
segmentation faults -- but only if the machine that SLURM allocated to MPI is
different to the one
18 matches
Mail list logo