Out of curiosity, why do you specify both --hostfile and -H ?
Do you observe the same behavior without --hostfile ~/.mpihosts ?
Also, do you have at least 4 cores on both A.lan and B.lan ?
Cheers,
Gilles
On Sunday, October 16, 2016, MM wrote:
> Hi,
>
> openmpi 1.10.3
>
n-mpi.org
> <javascript:_e(%7B%7D,'cvml','users-boun...@lists.open-mpi.org');>] *On
> Behalf Of *Gilles Gouaillardet
> *Sent:* Monday, October 17, 2016 9:30 AM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] communications groups
>
>
>
> Rick,
>
>
>
> I r
Rick,
In my understanding, sensorgroup is a group with only task 1
Consequently, sensorComm is
- similar to MPI_COMM_SELF on task 1
- MPI_COMM_NULL on other tasks, and hence the barrier fails
I suggest you double check sensorgroup is never MPI_GROUP_EMPTY
and add a test not to call MPI_Barrier
Rick,
I re-read the MPI standard and was unable to figure out if sensorgroup is
MPI_GROUP_EMPTY or a group with task 1 on tasks except task 1
(A group that does not contain the current task makes little sense to me,
but I do not see any reason why this group have to be MPI_GROUP_EMPTY)
20x/signals
>
> Unfortunately it changes nothing. The root rank stops and all other
> ranks (and mpirun) just stay, the remaining ranks at 100 % CPU waiting
> apparently in that allreduce. The stack trace looks a bit more
> interesting (git is always debug build ?), so I include it at t
Folks,
the problem is indeed pretty trivial to reproduce
i opened https://github.com/open-mpi/ompi/issues/2550 (and included a
reproducer)
Cheers,
Gilles
On Fri, Dec 9, 2016 at 5:15 AM, Noam Bernstein
<noam.bernst...@nrl.navy.mil> wrote:
> On Dec 8, 2016, at 6:05 AM, Gilles Gou
Christoph,
can you please try again with
mpirun --mca btl tcp,self --mca pml ob1 ...
that will help figuring out whether pml/cm and/or mtl/psm2 is involved or not.
if that causes a crash, then can you please try
mpirun --mca btl tcp,self --mca pml ob1 --mca coll ^tuned ...
that will help
Dave,
thanks for the info
for what it's worth, it is generally a bad idea to --with-xxx=/usr
since you might inadvertently use some other external components.
in your case, --with-libevent=external is what you need if you want to
use an external libevent library installed in /usr
i guess the
estigating this !
Cheers
Christof
On Thu, Dec 08, 2016 at 03:15:47PM -0500, Noam Bernstein wrote:
On Dec 8, 2016, at 6:05 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
wrote:
Christof,
There is something really odd with this stack trace.
count is zero, and some pointers do not point to
Siegmar,
I was able to reproduce the issue on my vm
(No need for a real heterogeneous cluster here)
I will keep digging tomorrow.
Note that if you specify an incorrect slot list, MPI_Comm_spawn fails with a
very unfriendly error message.
Right now, the 4th spawn'ed task crashes, so this is a
Juan,
Open MPI has its own implementation of OpenSHMEM.
The Mellanox software is very likely yet an other implementation of
OpenSHMEM.
So you can consider these as independent libraries
Cheers,
Gilles
On Wednesday, January 11, 2017, Juan A. Cordero Varelaq <
bioinformatica-i...@us.es> wrote:
s?
>
>
> Kind regards
>
> Siegmar
>
> Am 11.01.2017 um 10:04 schrieb Gilles Gouaillardet:
>
>> Siegmar,
>>
>> I was able to reproduce the issue on my vm
>> (No need for a real heterogeneous cluster here)
>>
>> I will keep digging tomorrow.
>
Hi,
it looks like you installed Open MPI 2.0.1 at the same location than
previous Open MPI 1.10, but you did not uninstall v1.10.
the faulty modules have very been likely removed from 2.0.1, hence the error.
you can simply remove the openmpi plugins directory and reinstall openmpi
rm -rf
Adam,
there are several things here
with an up-to-date master, you can specify an alternate ssh port via a
hostfile
see https://github.com/open-mpi/ompi/issues/2224
Open MPI requires more than just ssh.
- remote nodes (orted) need to call back mpirun (oob/tcp)
- nodes (MPI tasks) need
Roland,
the easiest way is to use an external hwloc that is configured with
--disable-nvml
an other option is to hack the embedded hwloc configure.m4 and pass
--disable-nvml to the embedded hwloc configure. note this requires you run
autogen.sh and you hence needs recent autotools.
i guess Open
Can you please try
mpirun --mca btl tcp,self ...
And if it works
mpirun --mca btl openib,self ...
Then can you try
mpirun --mca coll ^tuned --mca btl tcp,self ...
That will help figuring out whether the error is in the pml or the coll
framework/module
Cheers,
Gilles
On Thursday, March 23,
Matt,
a C++ compiler is required to configure Open MPI.
That being said, C++ compiler is only used if you build the C++ bindings
(That were removed from MPI-3)
And unless you plan to use the mpic++ wrapper (with or without the C++
bindings),
a valid C++ compiler is not required at all.
/*
Tom,
what if you use
type(mpi_datatype) :: mpiint
Cheers,
Gilles
On Thursday, March 23, 2017, Tom Rosmond wrote:
>
> Hello;
>
> I am converting some fortran 90/95 programs from the 'mpif.h' include file
> to the 'mpi_f08' model and have encountered a problem. Here is a
Hi,
yes, please open an issue on github, and post your configure and mpirun
command lines.
ideally, could you try the latest v1.10.6 and v2.1.0 ?
if you can reproduce the issue with a smaller number of MPI tasks, that
would be great too
Cheers,
Gilles
On 3/28/2017 11:19 PM, Götz
Iirc, there used to be a bug in Open MPI leading to such a false positive,
but I cannot remember the details.
I recommend you use at least the latest 1.10 (which is really a 1.8 + a few
more features and several bug fixes)
An other option is to simply +1 a mtt parameter and see if it helps
Joshua,
George previously explained you are limited by the size of your level X
cache.
that means that you might get optimal performance for a given message
size, let's say
when everything fits in the L2 cache.
when you increase the message size, L2 cache is too small, and you have
to
hosts_eth ... (With IB
interfaces down)
mpirun --mca btl openib,self,sm -hostfile hosts_ib0 ...
Regards,
Rodrigo
On Mon, Mar 20, 2017 at 8:29 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
wrote:
You will get similar results with
You will get similar results with hosts_ib and hosts_eth
If you want to use tcp over ethernet, you have to
mpirun --mca btl tcp,self,sm --mca btl_tcp_if_include eth0 ...
If you want to use tcp over ib, then
mpirun --mca btl tcp,self,sm --mca btl_tcp_if_include ib0 ...
Keep in mind that IMB calls
Note that might not be enough if hwloc detects nvml.
unfortunatly, there are only workarounds available for this :
1) edit opal/mca/hwloc/hwloc*/configure.m4 and add
enable_nvml=no
for example after enable_xml=yes
note you need recent autotools, and re-run autogen.pl --force
2) build Open
Hi,
The -pthread flag is likely pulled by libtool from the slurm libmpi.la
and/or libslurm.la
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option
Hope this helps
Gilles
On Monday, April 3, 2017,
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside
the socket assigned by Open MPI
which compiler are you using ?
do you set some environment variables to direct OpenMP to bind threads ?
Also, how do
What happens is mpirun does under the hood
orted
And your remote_exec does not propagate LD_LIBRARY_PATH
one option is to configure your remote_exec to do so, but I'd rather suggest
you re-configure ompi with --enable-orterun-prefix-by-default
If your remote_exec is ssh (if you are not running
So it seems
OPAL_HAVE_POSIX_THREADS
is not defined, and that should never happen !
Can you please compress and post (or upload into gist or similar) your
- config.log
- opal/include/opal_config.h
Cheers,
Gilles
On Sunday, April 9, 2017, Travis W. Drayna wrote:
> Gilles,
John,
can you run
free
before the first command and make sure you have all the physical and
available memory you expect ?
then, after a failed
mpirun -np 1 ./helloWorld
can you run
dmesg
and look for messages from the OOM killer ?
that would indicate you are running out of memory.
maybe some
The PR simply disables nvml in hwloc is CUDA is disabled in Open MPI.
it also add cuda directory to CPPFLAGS, so there should be no need to
manually add -I/usr/local/cuda/include to CPPFLAGS.
Siegmar,
could you please post your config.log
also, is there a nvml.h file in
there any hints how to cleanly transfer the OpenMPI binding to the OpenMP
tasks?
Thanks and kind regards,
Ado
On 12.04.2017 15:40, Gilles Gouaillardet wrote:
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper thread
Vincent,
Can you try a small program such as examples/ring_c.c ?
Does your app do MPI_Comm_spawn and friends ?
Can you post your mpirun command line ? Are you using a batch manager ?
This error message is typical of unresolved libraries.
(E.g. "ssh host ldd orted" fails to resolve some libs
Jim,
can you please post your configure command line and test output on both
systems ?
fwiw, Open MPI strictly sticks to the (current) MPI standard regarding
MPI_DATATYPE_NULL
(see
http://lists.mpi-forum.org/pipermail/mpi-forum/2016-January/006417.html)
there have been some attempts
Angel,
i suggest you get an xml topo with
hwloc --of xml
on both your "exotic" POWER platform and a more standard and recent one.
then you can manually edit the xml topology and add the missing objects.
finally, you can pass this to Open MPI like this
mpirun --mca hwloc_base_topo_file
which version of ompi are you running ?
this error can occur on systems with no NUMA object (e.g. single socket
with hwloc < 2)
as a workaround, you can
mpirun --map-by socket ...
iirc, this has been fixed
Cheers,
Gilles
On Thursday, March 9, 2017, Angel de Vicente wrote:
>
Can you run
lstopo
in your machine, and post the output ?
can you also try
mpirun --map-by socket --bind-to socket ...
and see if it helps ?
Cheers,
Gilles
On Thursday, March 9, 2017, Angel de Vicente <ang...@iac.es> wrote:
> Hi,
>
> Gilles Gouaillardet <gilles.goua
PSM is the infinipath driver, so unless you have some infinipath hardware,
you can safely disable it
Cheers,
Gilles
On Sunday, March 12, 2017, Saliya Ekanayake wrote:
> Hi,
>
> I've been trying to resolve a segfault that kept occurring with OpenMPI
> Java binding. I found
Hi,
there is likely something wrong in Open MPI (i will follow up in the
devel ML)
meanwhile, you can
mpirun --mca opal_set_max_sys_limits core:unlimited ...
Cheers,
Gilles
On 3/3/2017 1:01 PM, gzzh...@buaa.edu.cn wrote:
Hi Jeff:
Thanks for your suggestions.
1. I have
Graham,
you can configure Open MPI with '--enable-script-wrapper-compilers'
that will make wrappers as scripts instead of binaries.
Cheers,
Gilles
On 3/3/2017 10:23 AM, Graham Holland wrote:
Hello,
I am using OpenMPI version 1.10.2 on an arm development board and have
successfully
Dave,
unless you are doing direct launch (for example, use 'srun' instead of
'mpirun' under SLURM),
this is the way Open MPI is working : mpirun will use whatever the
resource manager provides
in order to spawn the remote orted (tm with PBS, qrsh with SGE, srun
with SLURM, ...).
then
Mahmood,
you might want to have a look at OpenHPC (which comes with a recent Open MPI)
Cheers,
Gilles
On Thu, Aug 3, 2017 at 9:48 PM, Mahmood Naderan wrote:
> Well, it seems that the default Rocks-openmpi dominates the systems. So, at
> the moment, I stick with that
addr show eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
> link/ether 08:00:38:3c:4e:65 brd ff:ff:ff:ff:ff:ff
> inet 172.24.44.190/23 brd 172.24.45.255 scope global eth0
> inet6 fe80::a00:38ff:fe3c:4e65/64 scope link
>
Keith,
MPI is running on both shared memory (e.g. one single node) and
distributed memory (e.g. several independent nodes).
here is what happens when you
mpirun -np a.out
1. an orted process is remotely spawned to each node
2. mpirun and orted fork a.out
unless a batch manager is used, remote
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
Fabricio,
the fortran runtime might (or not) use buffering for I/O.
as a consequence, data might be written immediatly to disk, or at a later time
(e.g. the file is closed, the buffer is full or the buffer is flushed)
you might want to manually flush the file, or there might be an option
not to
fffe5abb31
> Link layer: Ethernet
> -bash-4.1$
> %%%%%%
>
> On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
> <users@lists.open-mpi.org> wrote:
>>
>> ABoris, as Gil
Adam,
at first, you need to change the default send and receive socket buffers :
mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
/* note this will be the default from Open MPI 2.1.2 */
hopefully, that will be enough to greatly improve the bandwidth for
large messages.
generally
Hi Petr,
thanks for the report.
could you please configure Open MPI with the previously working command line
and compress and post the generated config.log ?
Cheers,
Gilles
On 7/11/2017 12:52 AM, Petr Hanousek wrote:
Dear developers,
I am using for a long time the proved configure
on the command line. :o)
-Adam
On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
wrote:
Adam,
at first, you need to change the default send and receive socket
buffers :
mpirun --mca btl_tcp_sn
Adam,
keep in mind that by default, recent Open MPI bind MPI tasks
- to cores if -np 2
- to NUMA domain otherwise (which is a socket in most cases, unless
you are running on a Xeon Phi)
so unless you specifically asked mpirun to do a binding consistent
with your needs, you might simply try to
Hi,
i cannot comment for the openib specific part.
the coll/tuned collective module is very likely to split messages in
order to use a more efficient
algorithm. an other way to put it is you probably do not want to use
large messages.
but if this is really what you want, then one option
Sam,
this example is using 8 MB size messages
if you are fine with using more memory, and your application should not
generate too much unexpected messages, then you can bump the eager_limit
for example
mpirun --mca btl_tcp_eager_limit $((8*1024*1024+128)) ...
worked for me
George,
in
thought the progress thread would have helped here.
just to be 100% sure, could you please confirm this is the intended
behavior and not a bug ?
Cheers,
Gilles
On Sat, Jul 22, 2017 at 5:00 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>
> On Thu, Jul 20, 2017 at 8:57 PM, Gill
posted.
Best wishes
Volker
On Jul 27, 2017, at 7:50 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
Thanks Jeff for your offer, i will contact you off-list later
i tried a gcc+gfortran and gcc+ifort on both linux and OS X
so far, only gcc+ifort on OS X is failing
i will t
did not trigger on a more standard
> platform - that would have simplified things.
>
> Best wishes
> Volker
>
>> On Jul 27, 2017, at 3:56 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
>>
>> Folks,
>>
>>
>> I am able to reproduce the issue on
Dave,
On 7/28/2017 12:54 AM, Dave Love wrote:
Gilles Gouaillardet <gilles.gouaillar...@gmail.com> writes:
Adam,
keep in mind that by default, recent Open MPI bind MPI tasks
- to cores if -np 2
- to NUMA domain otherwise
Not according to ompi_info from the latest release; it says
ars to work.
>
> Hopefully, no trivial mistakes in the testcase. I just spent a few days
> tracing this issue through a fairly large code, which is where the issue
> originally arose (and leads to wrong numbers).
>
> Best wishes
> Volker
>
>
>
>
>> On Jul 26,
Volker,
i was unable to reproduce this issue on linux
can you please post your full configure command line, your gnu
compiler version and the full test program ?
also, how many mpi tasks are you running ?
Cheers,
Gilles
On Wed, Jul 26, 2017 at 4:25 PM, Volker Blum
tching specific
>subroutine for this generic subroutine call. [MPI_ALLREDUCE]
> call MPI_ALLREDUCE(check_conventional_mpi, aux_check_success, 1,
> MPI_LOGICAL, &
>--------^
>compilation aborted for check_mpi_in_place_08.f90 (code 1)
>
>This is an interesti
ame result as with ‘include mpif.h', in that the output is
> >
> > * MPI_IN_PLACE does not appear to work as intended.
> > * Checking whether MPI_ALLREDUCE works at all.
> > * Without MPI_IN_PLACE, MPI_ALLREDUCE appears to work.
> >
> > Hm
Ludovic,
what happens here is that by default, a MPI task will only use the
closest IB device.
since tasks are bound to a socket, that means that tasks on socket 0
will only use mlx4_0, and tasks on socket 1 will only use mlx4_1.
because these are on independent subnets, that also means that
Charles,
did you build Open MPI with the external PMIx ?
iirc, Open MPI 2.0.x does not support cross version PMIx
Cheers,
Gilles
On Sun, Aug 6, 2017 at 7:59 PM, Charles A Taylor wrote:
>
>> On Aug 6, 2017, at 6:53 AM, Charles A Taylor wrote:
>>
>>
>> Anyone
Siegmar,
a noticeable difference is hello_1 does *not* sleep, whereas
hello_2_slave *does*
simply comment out the sleep(...) line, and performances will be identical
Cheers,
Gilles
On 7/31/2017 9:16 PM, Siegmar Gross wrote:
Hi,
I have two versions of a small program. In the first one
2, 2017 at 11:55 AM, Jackson, Gary L.
> > <gary.jack...@jhuapl.edu> wrote:
> >> I’m using a build of OpenMPI provided by a third party.
> >>
> >> --
> >> Gary Jackson, Ph.D.
> >> Johns Hopkins University Applied Physics
Folks,
for the records, this was investigated off-list
- the root cause was bad permissions on the /.../lib/openmpi directory
(no components could be found)
- then it was found tm support was not built-in, so mpirun did not
behave as expected under torque/pbs
Cheers,
Gilles
On
Thanks for all the information,
what i meant by
mpirun --mca shmem_base_verbose 100 ...
is really you modify your mpirun command line (or your torque script if
applicable) and add
--mca shmem_base_verbose 100
right after mpirun
Cheers,
Gilles
On 5/16/2017 3:59 AM, Ioannis Botsis
Hi,
if you run this under a debugger and look at how MPI_Scatterv is
invoked, you will find that
- sendcounts = {1, 1, 1}
- resizedtype has size 32
- recvcount*sizeof(MPI_INTEGER) = 32 on task 0, but 16 on task 1 and 2
=> too much data is sent to tasks 1 and 2, hence the error.
in this
found no active IB device ports
Hello world from rank 0 out of 1 processors
So it seems to work apart the error message.
2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp
<mailto:gil...@rist.or.jp>>:
Gabriele,
so it seems
date choice
On 19 May 2017 at 09:10, Gilles Gouaillardet <gil...@rist.or.jp
<mailto:gil...@rist.or.jp>> wrote:
Gabriele,
so it seems pml/pami assumes there is an infiniband card
available (!)
i guess IBM folks will comment on that shortly.
Gabriele,
can you
ompi_info --all | grep pml
also, make sure there is nothing in your environment pointing to an other
Open MPI install
for example
ldd a.out
should only point to IBM libraries
Cheers,
Gilles
On Thursday, May 18, 2017, Gabriele Fatigati wrote:
> Dear
Tim,
On 5/18/2017 2:44 PM, Tim Jim wrote:
In summary, I have attempted to install OpenMPI on Ubuntu 16.04 to the
following prefix: /opt/openmpi-openmpi-2.1.0. I have also manually
added the following to my .bashrc:
export PATH="/opt/openmpi/openmpi-2.1.0/bin:$PATH"
Andy,
it looks like some MPI libraries are being mixed in your environment
from the test/datatype directory, what if you
ldd .libs/lt-external32
does it resolve the the libmpi.so you expect ?
Cheers,
Gilles
On 5/25/2017 11:02 AM, Andy Riebs wrote:
Hi,
I'm trying to build OMPI on
Hi,
what if you
mpirun -np 4 ./test
Cheers,
Gilles
On Monday, May 22, 2017, Pranav Sumanth wrote:
> Hello All,
>
> I'm able to successfully compile my code when I execute the make command.
> However, when I run the code as:
>
> mpirun -np 4 test
>
> The error
Allan,
- on which node is mpirun invoked ?
- are you running from a batch manager ?
- is there any firewall running on your nodes ?
- how many interfaces are part of bond0 ?
the error is likely occuring when wiring-up mpirun/orted
what if you
mpirun -np 2 --hostfile nodes --mca
Allan,
the "No route to host" error indicates there is something going wrong
with IPoIB on your cluster
(and Open MPI is not involved whatsoever in that)
on sm3 and sm4, you can run
/sbin/ifconfig
brctl show
iptables -L
iptables -t nat -L
we might be able to figure out what is going
host_file
sm3-ib slots=2
sm4-ib slots=2
Will cause the command to hang.
I ran your netcat test again on sm3 and sm4,
[allan@sm3 proj]$ echo hello | nc 10.1.0.5 1234
[allan@sm4 ~]$ nc -l 1234
hello
[allan@sm4 ~]$
Thanks,
Allan
On 05/29/2017 02:14 AM, Gilles Gouaillardet wrote:
Allan,
Ralph,
the issue Siegmar initially reported was
loki hello_1 111 mpiexec -np 3 --host loki:2,exin hello_1_mpi
per what you wrote, this should be equivalent to
loki hello_1 111 mpiexec -np 3 --host loki:2,exin:1 hello_1_mpi
and this is what i initially wanted to double check (but i made a
RTE update PR
are committed? Perhaps Ralph has a point suggesting not to spend
time with the problem if it may already be resolved. Nevertheless,
I added the requested information after the commands below.
Am 31.05.2017 um 04:43 schrieb Gilles Gouaillardet:
Ralph,
the issue Siegmar initially
with Ralph.
Best regards,
Gilles
On 5/31/2017 4:20 PM, Siegmar Gross wrote:
Hi Gilles,
Am 31.05.2017 um 08:38 schrieb Gilles Gouaillardet:
Siegmar,
the "big ORTE update" is a bunch of backports from master to v3.x
btw, does the same error occurs with master ?
Yes, it does, but t
MPI_Comm_create_group was not available in Open MPI v1.6.
so unless you are willing to create your own subroutine in your
application, you'd rather upgrade to Open MPI v2
i recomment you configure Open MPI with
--disable-dlopen --prefix=
unless you plan to scale on thousands of nodes, you should
Ted,
fwiw, the 'master' branch has the behavior you expect.
meanwhile, you can simple edit your 'dum.sh' script and replace
/home/buildadina/src/aborttest02/aborttest02.exe
with
exec /home/buildadina/src/aborttest02/aborttest02.exe
Cheers,
Gilles
On 6/15/2017 3:01 AM, Ted Sussman
Ashwin,
did you try to run your app with a MPICH-based library (mvapich,
IntelMPI or even stock mpich) ?
or did you try with Open MPI v1.10 ?
the stacktrace does not indicate the double free occurs in MPI...
it seems you ran valgrind vs a shell and not your binary.
assuming your mpirun command
n MPI 1.4.3, the output is
>>
>> After aborttest: OMPI_COMM_WORLD_RANK=0
>>
>> which shows that the shell script for the process with rank 0
>> continues after the
>> abort,
>> but that the shell script for the process with rank
Ashwin,
the valgrind logs clearly indicate you are trying to access some memory
that was already free'd
for example
[1,0]:==4683== Invalid read of size 4
[1,0]:==4683==at 0x795DC2: __src_input_MOD_organize_input
(src_input.f90:2318)
[1,0]:==4683== Address 0xb4001d0 is 0 bytes inside
Hi
Can you please post your configure command line for 2.1.1 ?
On which architecture are you running? x86_64 ?
Cheers,
Gilles
"ashwin .D" wrote:
>Also when I try to build and run a make check I get these errors - Am I clear
>to proceed or is my installation broken ? This
Alberto,
Are you saying the program hang even without jumbo frame (aka 1500 MTU) ?
At first, make sure there is no firewall running, and then you can try
mpirun --mca btl tcp,vader,self --mca oob_tcp_if_include eth0 --mca
btl_tcp_if_include eth0 ...
(Replace eth0 with the interface name you want
Host: openpower
Framework: pml
--
2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>>:
Gabriele,
pml/pami is h
.1.0/24
--mca oob_base_verbose 100 ring &> cmd3
If I increase the number of processors in the ring program, mpirun
will not succeed.
mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
--mca oob_base_verbose 100 ring &> cmd4
On 05/19/2017 02:18 AM, Gille
Peter and all,
an easier option is to configure Open MPI with --mpirun-prefix-by-default
this will automagically add rpath to the libs.
Cheers,
Gilles
On Thu, Sep 14, 2017 at 6:43 PM, Peter Kjellström wrote:
> On Wed, 13 Sep 2017 20:13:54 +0430
> Mahmood Naderan
Mahmood,
there is a typo, it should be
-Wl,-rpath,/.../
(note the minus before rpath)
Cheers,
Gilles
On Thu, Sep 14, 2017 at 6:58 PM, Mahmood Naderan wrote:
>>In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
>>compile a dynamic binary "hello" with
> On 09/21/2017 12:32 AM, Tim Jim wrote:
>>
>> Hi,
>>
>> I tried as you suggested: export nvml_enable=no, then reconfigured and
>> ran make all install again, but mpicc is still producing the same error.
>> What should I try next?
>>
>> Many tha
nable=no" and "export enable_opencl=no"? What effects do these
declarations have on the normal functioning of mpi?
Many thanks.
On 22 September 2017 at 15:55, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
wrote:
Was
Unless you are using mxm, you can disable tcp with
mpirun --mca pml ob1 --mca btl ^tcp ...
coll/tuned select an algorithm based on communicator size and message size. The
spike could occur because a suboptimal (on your cluster and with your job
topology) algo is selected.
Note you can force
Thanks for the report,
is this related to https://github.com/open-mpi/ompi/issues/4211 ?
there is a known issue when libnl-3 is installed but libnl-route-3 is not
Cheers,
Gilles
On 9/21/2017 8:53 AM, Stephen Guzik wrote:
When compiling (on Debian stretch), I see:
In file included from
ent,
and the upcoming real fix will be able to build this component.
Cheers,
Gilles
On 9/21/2017 9:22 AM, Gilles Gouaillardet wrote:
Thanks for the report,
is this related to https://github.com/open-mpi/ompi/issues/4211 ?
there is a known issue when libnl-3 is installed but libnl-rout
Tim,
i am not familiar with CUDA, but that might help
can you please
export nvml_enable=no
and then re-configure and rebuild Open MPI ?
i hope this will help you
Cheers,
Gilles
On 9/21/2017 3:04 PM, Tim Jim wrote:
Hello,
Apologies to bring up this old thread - I finally had a
Stephen,
a simpler option is to install the libnl-route-3-dev.
note you will not be able to build the reachable/netlink component
without this package.
Cheers,
Gilles
On 9/21/2017 1:04 PM, Gilles Gouaillardet wrote:
Stephen,
this is very likely related to the issue already reported
- where should I set export nvml_enable=no? Should
I reconfigure with default cuda support or keep the --without-cuda flag?
Kind regards,
Tim
On 21 September 2017 at 15:22, Gilles Gouaillardet <gil...@rist.or.jp
<mailto:gil...@rist.or.jp>> wrote:
Tim,
i am no
On Tue, Sep 19, 2017 at 11:58 AM, Jeff Hammond wrote:
> Fortran is a legit problem, although if somebody builds a standalone Fortran
> 2015 implementation of the MPI interface, it would be decoupled from the MPI
> library compilation.
Is this even doable without making
Anthony,
in your script, can you
set -x
env
pbsdsh hostname
mpirun --display-map --display-allocation --mca ess_base_verbose 10
--mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname
and then compress and send the output ?
Cheers,
Gilles
On 10/3/2017 1:19 PM, Anthony Thyssen
601 - 700 of 1020 matches
Mail list logo