Re: [OMPI devel] how to run OpenMPI in OSv container

2015-10-15 Thread Gilles Gouaillardet
Justin, IOF stands for Input/Output (aka I/O) Forwarding here is a very high level overview of a quite simple case. on host A, you run mpirun -host B,C -np 2 a.out without any batch manager and TCP interconnect first, mpirun will fork ssh B orted ... ssh C orted ... the orted daemons will

[hwloc-devel] Create success (hwloc git 1.11.1-3-g050535f)

2015-10-15 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.11.1-3-g050535f Start time: Thu Oct 15 21:05:47 EDT 2015 End time: Thu Oct 15 21:07:24 EDT 2015 Your friendly daemon, Cyrador

[hwloc-devel] Create success (hwloc git 1.10.1-73-g7acba6b)

2015-10-15 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.10.1-73-g7acba6b Start time: Thu Oct 15 21:04:21 EDT 2015 End time: Thu Oct 15 21:05:47 EDT 2015 Your friendly daemon, Cyrador

[hwloc-devel] Create success (hwloc git 1.9.1-68-gfcc0b3f)

2015-10-15 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.9.1-68-gfcc0b3f Start time: Thu Oct 15 21:02:58 EDT 2015 End time: Thu Oct 15 21:04:21 EDT 2015 Your friendly daemon, Cyrador

[hwloc-devel] Create success (hwloc git dev-816-gbb23333)

2015-10-15 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-816-gbb2 Start time: Thu Oct 15 21:01:02 EDT 2015 End time: Thu Oct 15 21:02:49 EDT 2015 Your friendly daemon, Cyrador

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
> On 16 Oct 2015, at 0:44 , Ralph Castain wrote: > > Hmmmok. I'll have to look at it this weekend when I return from travel. > Can you please send me your test program so I can try to locally reproduce it? Ok, thanks Ralph. Start the DVM with: orte-dvm --report-uri

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Ralph Castain
Hmmmok. I'll have to look at it this weekend when I return from travel. Can you please send me your test program so I can try to locally reproduce it? On Thu, Oct 15, 2015 at 3:42 PM, Mark Santcroos wrote: > > > On 16 Oct 2015, at 0:23 , Ralph Castain

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
> On 16 Oct 2015, at 0:23 , Ralph Castain wrote: > Okay, that means that the dvm isn't recognizing that the jobs actually > completed. Ok. > So the question is: what is it about those jobs? They are all the same. > Are those 6 jobs very short-lived, and the others are

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Ralph Castain
Okay, that means that the dvm isn't recognizing that the jobs actually completed. So the question is: what is it about those jobs? Are those 6 jobs very short-lived, and the others are longer-lived? If you look at the nodes (before you kill the dvm), are any of those procs still there? On Thu,

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
> On 16 Oct 2015, at 0:09 , Ralph Castain wrote: > > Help me out a bit - how many jobs did you actually run? 42 tasks in total, 6 stalled, 36 returned.

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Ralph Castain
Help me out a bit - how many jobs did you actually run? On Thu, Oct 15, 2015 at 2:33 PM, Mark Santcroos wrote: > > > On 15 Oct 2015, at 17:25 , Ralph Castain wrote: > > > > Interesting - I see why. Please try this version. > > Ok, that works as

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
> On 15 Oct 2015, at 17:25 , Ralph Castain wrote: > > Interesting - I see why. Please try this version. Ok, that works as expected. I'll repeat the results with this version too: $ grep TERMINATED dvm_output-patched.txt |wc -l 36 $ grep NOTIFYING

[OMPI devel] how to run OpenMPI in OSv container

2015-10-15 Thread Justin Cinkelj
I'm trying to run OpenMPI in OSv container (https://github.com/cloudius-systems/osv). It's a single process, single address space VM, without fork, exec, openpty function. With some butchering of OSv and OpenMPI I was able to compile orted.so, and run it inside OSv via mpirun (mpirun is on remote

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Ralph Castain
Interesting - I see why. Please try this version. Ralph On Thu, Oct 15, 2015 at 4:05 AM, Mark Santcroos wrote: > > > On 15 Oct 2015, at 4:38 , Ralph Castain wrote: > > Okay, please try the attached patch. > > *scratch* > > Although I reported

Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x

2015-10-15 Thread Paul Kapinos
On 10/14/15 19:35, Jeff Squyres (jsquyres) wrote: On Oct 14, 2015, at 12:48 PM, Nathan Hjelm wrote: I think this is from a known issue. Try applying this and run again: https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch The good news is

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
> On 15 Oct 2015, at 4:38 , Ralph Castain wrote: > Okay, please try the attached patch. *scratch* Although I reported results with the patch earlier, I can't reproduce it anymore. Now orte-dvm shuts down after the first orte-submit completes with: [netbook:72038]

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
Another data point, this only seems to happen for really short tasks, i.e. < 1 sec.

Re: [OMPI devel] [OMPI users] fatal error: openmpi-v2.x-dev-415-g5c9b192 andopenmpi-dev-2696-gd579a07

2015-10-15 Thread Jeff Squyres (jsquyres)
Followup on https://github.com/open-mpi/ompi/pull/1028. > On Oct 14, 2015, at 2:37 AM, Gilles Gouaillardet > wrote: > > Folks, > > i was able to reproduce the issue by adding CPPFLAGS=-I/tmp to my > configure command line. > here is what happens : >

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-15 Thread Mark Santcroos
Hi! > On 15 Oct 2015, at 4:38 , Ralph Castain wrote: > > Okay, please try the attached patch. It will cause two messages to be output > for each job: one indicating the job has been marked terminated, and the > other reporting that the completion message was sent to the