t here on the mailing list.
--
Jeff Squyres
jsquy...@cisco.com
From: users on behalf of Jeff Squyres
(jsquyres) via users
Sent: Thursday, May 5, 2022 3:31 PM
To: George Bosilca; Open MPI Users
Cc: Jeff Squyres (jsquyres)
Subject: Re: [OMPI users] mpirun hangs on m
2022 3:19 PM
To: Open MPI Users
Cc: Jeff Squyres (jsquyres); Scott Sayres
Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3
That is weird, but maybe it is not a deadlock, but a very slow progress. In the
child can you print the fdmax and i in the frame do_child.
George.
On Thu,
That is weird, but maybe it is not a deadlock, but a very slow progress. In
the child can you print the fdmax and i in the frame do_child.
George.
On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users <
users@lists.open-mpi.org> wrote:
> Jeff, thanks.
> from 1:
>
> (lldb) process attach --pid 9
Jeff, thanks.
from 1:
(lldb) process attach --pid 95083
Process 95083 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0001bde25628 libsystem_kernel.dylib`close + 8
libsystem_kernel.dylib`close:
-> 0x1bde25628 <+8>: b.lo 0x1bde25648
You can use "lldb -p PID" to attach to a running process.
--
Jeff Squyres
jsquy...@cisco.com
From: Scott Sayres
Sent: Thursday, May 5, 2022 11:22 AM
To: Jeff Squyres (jsquyres)
Cc: Open MPI Users
Subject: Re: [OMPI users] mpirun hangs on m1 mac
Jeff,
It does launch two mpirun processes (when hung from another terminal window)
scottsayres 95083 99.0 0.0 408918416 1472 s002 R 8:20AM
0:04.48 mpirun -np 4 foo.sh
scottsayres 95085 0.0 0.0 408628368 1632 s006 S+8:20AM
0:00.00 egrep mpirun|foo.sh
scottsayres
happens immediately after forking the
> child process... which is weird).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ________
> From: Scott Sayres
> Sent: Wednesday, May 4, 2022 4:02 PM
> To: Jeff Squyres (jsquyres)
> Cc: Open MPI Users
> Subject: Re: [OMPI users] mp
ild
process... which is weird).
--
Jeff Squyres
jsquy...@cisco.com
From: Scott Sayres
Sent: Wednesday, May 4, 2022 4:02 PM
To: Jeff Squyres (jsquyres)
Cc: Open MPI Users
Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3
foo.sh is executabl
it via:
>>
>> mpirun -np 1 foo.sh
>>
>> If you start seeing output, good!If it completes, better!
>>
>> If it hangs, and/or if you don't see any output at all, do this:
>>
>> ps auxwww | egrep 'mpirun|foo.sh'
>>
>> It should show mp
ut at all, do this:
>
> ps auxwww | egrep 'mpirun|foo.sh'
>
> It should show mpirun and 2 copies of foo.sh (and probably a grep). Does
> it?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ________________
> From: Scott Sayres
>
uy...@cisco.com
From: Scott Sayres
Sent: Wednesday, May 4, 2022 2:47 PM
To: Open MPI Users
Cc: Jeff Squyres (jsquyres)
Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3
Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option.
This shows more
Following Jeff's advice, I have rebuilt open-mpi by hand using the -g
option. This shows more information as below. I am attempting George's
advice of how to track the child but notice that gdb does not support
arm64. attempting to update lldb.
scottsayres@scotts-mbp openmpi-4.1.3 % lldb mpir
Sent: Wednesday, May 4, 2022 12:35 PM
To: Open MPI Users
Cc: George Bosilca
Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run both
MPI and non-MPI apps without any issues.
Try running `lldb mpirun -- -np 1 hos
Scott,
This shows the deadlock arrives during the local spawn. Here is how things
are supposed to work: the mpirun process (parent) will fork (the child),
and these 2 processes are connected through a pipe. The child will then
execve the desired command (hostname in your case), and this will close
Hi George, Thanks! You have just taught me a new trick. Although I do not
yet understand the output, it is below:
scottsayres@scotts-mbp ~ % lldb mpirun -- -np 1 hostname
(lldb) target create "mpirun"
Current executable set to 'mpirun' (arm64).
(lldb) settings set -- target.run-args "-np" "1
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run
both MPI and non-MPI apps without any issues.
Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a
CTRL+C to get back on the debugger and then `backtrace` to see where it is
waiting.
George.
On Wed, Ma
Thanks for looking at this Jeff.
No, I cannot use mpirun to launch a non-MPI application.The command
"mpirun -np 2 hostname" also hangs.
I get the following output if I add the -d command before (I've replaced
the server with the hashtags) :
[scotts-mbp.3500.dhcp.###:05469] procdir:
/var/fol
Are you able to use mpirun to launch a non-MPI application? E.g.:
mpirun -np 2 hostname
And if that works, can you run the simple example MPI apps in the "examples"
directory of the MPI source tarball (the "hello world" and "ring" programs)?
E.g.:
cd examples
make
mpirun -np 4 hello_c
mpirun
There can be lots of reasons that this happens. Can you send all the
information listed here?
https://www.open-mpi.org/community/help/
> On Aug 15, 2018, at 10:55 AM, Mota, Thyago wrote:
>
> Hello.
>
> I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it
> hang
Thank you for help. Now it works.
Klara Hornisova
On Thu, Jan 15, 2015 at 5:54 PM, Marco Atzeri
wrote:
>
>
> On 1/15/2015 5:39 PM, Klara Hornisova wrote:
>
>> I have installed OpenMPI 1.6.5 under cygwin. When trying test example
>>
>> $mpirun hello
>>
>
> current cygwin package is 1.8.4-1, coul
On 1/15/2015 5:39 PM, Klara Hornisova wrote:
I have installed OpenMPI 1.6.5 under cygwin. When trying test example
$mpirun hello
current cygwin package is 1.8.4-1, could you test it ?
or, e.g., more complex examples from scalapack, such as
$mpirun -np 4 xslu
everything works fine when t
I solved the issue by accepting the input traffic of data packages
through the TCP Ports as long as they are sent "from" and "to" the local
machine. Here is the line I added to the iptables:
/sbin/iptables -A INPUT --source --destination
--protocol tcp -j ACCEPT
Just an observation, I
FWIW: I'm working on a rewrite of our out-of-band comm system (it does the
wireup that is hanging on your system) that will include a shared memory
module. Once that is in place, this problem will go away when running on a
single node (still need sockets for multi-node, of course).
On Apr 11,
You were right, Ralph. I made a short test turning off the firewall and
MPI ran as predicted. I am taking a look to the firewall rules, to
figure out how to set it up properly, so that it does not interfere with
OpenMPI's functionalities. I will post the required changes in those
settings as so
In fact we should have restrictive firewall settings, as long as I
remember. I will check the rules again tomorrow morning. That's very
interesting, I would expect such kind of problem if I were working with
a cluster, but I haven't thought that it might lead also to problems for
the internal c
Best guess is that there is some issue with getting TCP sockets on the system -
once the procs are launched, they need to open a TCP socket and communicate
back to mpirun. If the socket is "stuck" waiting to complete the open, things
will hang.
You might check to ensure there isn't some securit
On Jan 18, 2012, at 4:15 AM, Theiner, Andre wrote:
> I also have requested the user to run the following adaption to his original
> command "mpriun -np 9 interFoam -parallel". I hoped to get a kind of debug
> output
> which points me into the right way. The new command did not work and I am a
>
o:users-boun...@open-mpi.org] On Behalf
Of Jeff Squyres
Sent: Dienstag, 17. Januar 2012 22:53
To: Open MPI Users
Subject: Re: [OMPI users] mpirun hangs when used on more than 2 CPUs
You should probably also run the ompi_info command; it tells you details about
your installation, and how it was
multiple processors?
>>> Is there a special flag which tells the compiler to care for multiple CPUs?
>>>
>>> Andre
>>>
>>>
>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>> Behalf Of devendra rai
>
a special flag which tells the compiler to care for multiple CPUs?
>>
>> Andre
>>
>>
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of devendra rai
>> Sent: Montag, 16. Januar 2012 13:25
>> To: Open MPI Users
>>
cial flag which tells the compiler to care for multiple CPUs?
>
> Andre
>
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of devendra rai
> Sent: Montag, 16. Januar 2012 13:25
> To: Open MPI Users
> Subject: Re: [OMPI users] mpirun ha
rai
Sent: Montag, 16. Januar 2012 13:25
To: Open MPI Users
Subject: Re: [OMPI users] mpirun hangs when used on more than 2 CPUs
Hello Andre,
It may be possible that your openmpi does not support threaded MPI-calls (if
these are happening). I had a similar problem, and it was traced to this cause
Hello Andre,
It may be possible that your openmpi does not support threaded MPI-calls (if
these are happening). I had a similar problem, and it was traced to this cause.
If you installed your openmpi from available repositories, chances are that you
do not have thread-support.
Here's a small s
Cryptic enough :-)
Best I can tell, your TCP comm isn't working. All your procs are failing
because they can't talk to each other.
I'm also seeing something I don't understand:
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
You
There is a bug in that tarball which was fixed as of yesterday. However, the
patch that you need was the cause of the bug, so the fix for your problem is no
longer in the 1.4 branch.
As you probably recall, I had cautioned that the fix might not make it to the
1.4 series. At the time, I was con
Hi Bogdan,
Thanks for the information and looking forward to the new OpenMPI feature of
port restriction...
About Debian, I was wondering about that...I've had no problems with it and I
was thinking everything was just done for me; of course, another possibility is
that there was no firewall
On Wed, 18 Mar 2009, Raymond Wan wrote:
Perhaps it has something to do with RH's defaults for the firewall settings?
If your sysadmin uses kickstart to configure the systems, (s)he has to
add 'firewall --disabled'; similar for SELinux which seems to have
caused problems to another person on
Hi Ron,
Ron Babich wrote:
Thanks for your response. I had noticed your thread, which is why I'm
embarrassed (but happy) to say that it looks like my problem was the
same as yours. I mentioned in my original email that there was no
firewall running, which it turns out was a lie. I think th
Hi Ray,
Thanks for your response. I had noticed your thread, which is why I'm
embarrassed (but happy) to say that it looks like my problem was the same
as yours. I mentioned in my original email that there was no firewall
running, which it turns out was a lie. I think that when I checked
b
Hi Ron,
Ron Babich wrote:
Hi Everyone,
I'm having a very basic problem getting an MPI job to run on multiple
nodes. My setup consists of two identically configured nodes, called
node01 and node02, connected via ethernet and infiniband. They are
running CentOS 5.2 and the bundled OMPI, ver
2009/1/6 Ralph Castain
>
> On Jan 5, 2009, at 5:19 PM, Jeff Squyres wrote:
>
> On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:
>>
>> Interesting though. I thought in such a simple scenario shared memory
>>> would be used for IPC (or whatever's fastest) . But nope. Even with one
>>> process st
On Jan 5, 2009, at 5:19 PM, Jeff Squyres wrote:
On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:
Interesting though. I thought in such a simple scenario shared
memory would be used for IPC (or whatever's fastest) . But nope.
Even with one process still it wants to use TCP/IP to communicat
On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:
Interesting though. I thought in such a simple scenario shared
memory would be used for IPC (or whatever's fastest) . But nope.
Even with one process still it wants to use TCP/IP to communicate
between mpirun and orted.
Correct -- we only
2009/1/3 Maciej Kazulak
> Hi,
>
> I have a weird problem. After a fresh install mpirun refuses to work:
>
> box% ./hello
> Process 0 on box out of 1
> box% mpirun -np 1 ./hello
> # hangs here, no output, nothing at all; on another terminal:
> box% ps axl | egrep 'mpirun|orted'
> 0 1000 24162 76
On Aug 16, 2007, at 5:34 AM, jody wrote:
Just a quick update about my ssh/LD_LIBRARY_PATH problem.
Apparently on my System the sshd was configured not to permit
user defined environment variables (security reasons?).
To fix that i had to change the file
/etc/ssh/sshd_config
By changing the en
Hi Tim
Just a quick update about my ssh/LD_LIBRARY_PATH problem.
Apparently on my System the sshd was configured not to permit
user defined environment variables (security reasons?).
To fix that i had to change the file
/etc/ssh/sshd_config
By changing the entry
#PermitUserEnvironment no
to
Jody,
jody wrote:
Hi TIm
thanks for the suggestions.
I now set both paths in .zshenv but it seems that LD_LIBRARY_PATH
still does not get set.
The ldd experment shows that all openmpi libraries are not found,
and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is
not.
Are you
Hi TIm
thanks for the suggestions.
I now set both paths in .zshenv but it seems that LD_LIBRARY_PATH
still does not get set.
The ldd experment shows that all openmpi libraries are not found,
and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is not.
It is rather unclear why thi
Hi Jody,
jody wrote:
Hi
I installed openmpi 1.2.2 on a quad core intel machine running fedora 6
(hostname plankton)
I set PATH and LD_LIBRARY in the .zshrc file:
Note that .zshrc is only used for interactive logins. You need to setup
your system so the LD_LIBRARY_PATH and PATH is also set for
On Feb 24, 2006, at 8:23 AM, Emanuel Ziegler wrote:
So, the question from the mpirun_debug.out-file is, what IP-
addresses do
node01 and node02 have, is the local 10.0.0.1 node01, while
10.1.0.1 is
node02?
Maybe the route on node01 is not correct to node02?
Ok, I figured out the problem, bu
> So, the question from the mpirun_debug.out-file is, what IP-addresses do
> node01 and node02 have, is the local 10.0.0.1 node01, while 10.1.0.1 is
> node02?
> Maybe the route on node01 is not correct to node02?
Ok, I figured out the problem, but didn't solve it completely.
node01 and node02 b
On Fri, 24 Feb 2006, Emanuel Ziegler wrote:
So "No rout to host" means that the TCP package could not be sent
(usually host down, broken routing table, network interface down,
...). But it's 'ping'able and even rsh works fine.
... or some packet filtering is enabled. Check with 'iptables -L -
Hello Emanual,
can you actually log in using rsh without submitting a password?
I would rather use the ssh-based login using public-keys to login. This is
definitely more secure but in Your first mail, You said, ssh wouldn't work
either?
So, the question from the mpirun_debug.out-file is, what I
> >From /usr/include/asm/errno.h:
>
> #define EHOSTUNREACH113 /* No route to host */
Ah, I thought it was an internal openMPI error number and 'grep'ed the
source code without success. So "No rout to host" means that the TCP
package could not be sent (usually host down, broken routi
On Thu, 23 Feb 2006, Emanuel Ziegler wrote:
Unfortunately, I don't know what errno=113 means, but obviously it's a
TCP problem.
From /usr/include/asm/errno.h:
#define EHOSTUNREACH113 /* No route to host */
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliche
55 matches
Mail list logo