Hi,
Angel de Vicente via users writes:
> I have tried:
> + /etc/pmix-mca-params.conf
> + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf
> but no luck.
Never mind, /etc/openmpi/pmix-mca-params.conf was the right one.
Cheers,
--
Ángel de Vicente
Hello,
with our current setting of OpenMPI and Slurm in a Ubuntu 22.04 server,
when we submit MPI jobs I get the message:
PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_ds12_lock_pthread.c at line 169
Following https://github.com/open-mpi/ompi/issues/7516, I tried setting
PMIX_
Hello,
thanks for your help and suggestions.
At the end it was no issue with OpenMPI or with any other system stuff,
but rather a single line in our code. I thought I was doing the tests
with the -fbounds-check option, but it turns out I was not, arrrghh!! At
some point I was writing outside one
Hello,
"Keller, Rainer" writes:
> You’re using MPI_Probe() with Threads; that’s not safe.
> Please consider using MPI_Mprobe() together with MPI_Mrecv().
many thanks for the suggestion. I will try with the M variants, though I
was under the impression that mpi_probe() was OK as far as one made
Hello Jeff,
"Jeff Squyres (jsquyres)" writes:
> With THREAD_FUNNELED, it means that there can only be one thread in
> MPI at a time -- and it needs to be the same thread as the one that
> called MPI_INIT_THREAD.
>
> Is that the case in your app?
the master rank (i.e. 0) never creates threads,
Thanks Gilles,
Gilles Gouaillardet via users writes:
> You can first double check you
> MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...)
my code uses "mpi_thread_funneled" and OpenMPI was compiled with
MPI_THREAD_MULTIPLE support:
,
| ompi_info | grep -i thread
| Thread support: p
Hello,
I'm running out of ideas, and wonder if someone here could have some
tips on how to debug a segmentation fault I'm having with my
application [due to the nature of the problem I'm wondering if the
problem is with OpenMPI itself rather than my app, though at this point
I'm not leaning strong
Hello,
Joshua Ladd writes:
> These are very, very old versions of UCX and HCOLL installed in your
> environment. Also, MXM was deprecated years ago in favor of UCX. What
> version of MOFED is installed (run ofed_info -s)? What HCA generation
> is present (run ibstat).
MOFED is: MLNX_OFED_LINUX
Hello,
John Hearns via users writes:
> Stupid answer from me. If latency/bandwidth numbers are bad then check
> that you are really running over the interface that you think you
> should be. You could be falling back to running over Ethernet.
I'm quite out of my depth here, so all answers are h
Hello,
"Jeff Squyres (jsquyres)" writes:
> I'd recommend against using Open MPI v3.1.0 -- it's quite old. If you
> have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which
> has all the rolled-up bug fixes on the v3.1.x series.
>
> That being said, Open MPI v4.1.2 is the most curre
Hello,
Gilles Gouaillardet via users writes:
> Infiniband detection likely fails before checking expanded verbs.
thanks for this. At the end, after playing a bit with different options,
I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted
4.1.1, but that would not compile cl
Hi,
I'm trying to compile the latest OpenMPI version with Infiniband support
in our local cluster, but didn't get very far (since I'm installing this
via Spack, I also asked in their support group).
I'm doing the installation via Spack, which is issuing the following
.configure step (see the opti
Hi,
Joshua Ladd writes:
> This is an ancient version of HCOLL. Please upgrade to the latest
> version (you can do this by installing HPC-X
> https://www.mellanox.com/products/hpc-x-toolkit)
Just to close the circle and inform that all seems OK now.
I don't have root permission in this machine
Hi,
Joshua Ladd writes:
> We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it
> takes exactly the same 19 secs (80 ranks).
>
> What version of HCOLL are you using? Command line?
Thanks for having a look at this.
According to ompi_info, our OpenMPI (version 3.0.1) was config
Hi,
George Bosilca writes:
> If I'm not mistaken, hcoll is playing with the opal_progress in a way
> that conflicts with the blessed usage of progress in OMPI and prevents
> other components from advancing and timely completing requests. The
> impact is minimal for sequential applications using
Hi,
in one of our codes, we want to create a log of events that happen in
the MPI processes, where the number of these events and their timing is
unpredictable.
So I implemented a simple test code, where process 0
creates a thread that is just busy-waiting for messages from any
process, and which
Brice Goglin writes:
> Ok, that's a very old kernel on a very old POWER processor, it's
> expected that hwloc doesn't get much topology information, and it's
> then expected that OpenMPI cannot apply most binding policies.
Just in case it can add anything, I tried with an older OpenMPI version
(
, Brice Goglin wrote:
> What's this machine made of? (processor, etc)
> What kernel are you running ?
>
> Getting no "socket" or "package" at all is quite rare these days.
>
> Brice
>
>
>
>
> Le 09/03/2017 15:28, Angel de Vicente a écrit :
&
Hi again,
thanks for your help. I installed the latest OpenMPI (2.0.2).
lstopo output:
,
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB
Hi,
Gilles Gouaillardet writes:
> Can you run
> lstopo
> in your machine, and post the output ?
no lstopo in my machine. This is part of hwloc, right?
> can you also try
> mpirun --map-by socket --bind-to socket ...
> and see if it helps ?
same issue.
Perhaps I need to compile hwloc as well?
Hi,
Gilles Gouaillardet writes:
> which version of ompi are you running ?
2.0.1
> this error can occur on systems with no NUMA object (e.g. single
> socket with hwloc < 2)
> as a workaround, you can
> mpirun --map-by socket ...
with --map-by socket I get exactly the same issue (both in the log
Hi,
I'm trying to get OpenMPI running in a new machine, and I came accross
an error message that I hadn't seen before.
,
| can@login1:> mpirun -np 1 ./code config.txt
| --
| No objects of the specified type were found on
Hi,
Reuti writes:
> At first I thought you want to run a queuing system inside a queuing
> system, but this looks like you want to replace the resource manager.
yes, if this could work reasonably well, we could do without the
resource manager.
> Under which user account the DVM daemons will run
Hi,
"r...@open-mpi.org" writes:
>> With the DVM, is it possible to keep these jobs in some sort of queue,
>> so that they will be executed when the cores get free?
>
> It wouldn’t be hard to do so - as long as it was just a simple FIFO
> scheduler. I wouldn’t want it to get too complex.
a simpl
Hi,
"r...@open-mpi.org" writes:
> You might want to try using the DVM (distributed virtual machine)
> mode in ORTE. You can start it on an allocation using the “orte-dvm”
> cmd, and then submit jobs to it with “mpirun --hnp ”, where foo
> is either the contact info printed out by orte-dvm, or the
Hi,
"Jeff Squyres (jsquyres)" writes:
> The list of names in the hostfile specifies the servers that will be used,
> not the network interfaces. Have a look at the TCP portion of the FAQ:
>
> http://www.open-mpi.org/faq/?category=tcp
Thanks a lot for this.
Now it works OK if I run it lik
Hi again,
Angel de Vicente writes:
> yes, that's just what I did with orted. I saw the port that it was
> trying to connect and telnet to it, and I got "No route to host", so
> that's why I was going the firewall path. Hopefully the sysadmins can
> disable the f
Hi,
"Jeff Squyres (jsquyres)" writes:
>>> I'm starting to think that perhaps is a firewall issue? I don't have
>>> root access in these machines but I'll try to investigate.
> A simple test is to try any socket-based server app between the two
> machines that opens a random listening socket. Tr
Hi,
Ralph Castain writes:
> On May 4, 2013, at 4:54 PM, Angel de Vicente wrote:
>>
>> Is there any way to dump details of what OpenMPI is trying to do in each
>> node, so I can see if it is looking for different libraries in each
>> node, or something similar?
Hi,
I have used OpenMPI before without any troubles, and configured MPICH,
MPICH2 and OpenMPI in many different machines before, but recently we
upgraded the OS to Fedora 17, and now I'm having trouble running an MPI
code in two of our machines connected via a switch.
I thought perhaps the old in
30 matches
Mail list logo