Justin,
IOF stands for Input/Output (aka I/O) Forwarding
here is a very high level overview of a quite simple case.
on host A, you run
mpirun -host B,C -np 2 a.out
without any batch manager and TCP interconnect
first, mpirun will fork
ssh B orted ...
ssh C orted ...
the orted daemons will
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc 1.11.1-3-g050535f
Start time: Thu Oct 15 21:05:47 EDT 2015
End time: Thu Oct 15 21:07:24 EDT 2015
Your friendly daemon,
Cyrador
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc 1.10.1-73-g7acba6b
Start time: Thu Oct 15 21:04:21 EDT 2015
End time: Thu Oct 15 21:05:47 EDT 2015
Your friendly daemon,
Cyrador
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc 1.9.1-68-gfcc0b3f
Start time: Thu Oct 15 21:02:58 EDT 2015
End time: Thu Oct 15 21:04:21 EDT 2015
Your friendly daemon,
Cyrador
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc dev-816-gbb2
Start time: Thu Oct 15 21:01:02 EDT 2015
End time: Thu Oct 15 21:02:49 EDT 2015
Your friendly daemon,
Cyrador
> On 16 Oct 2015, at 0:44 , Ralph Castain wrote:
>
> Hmmmok. I'll have to look at it this weekend when I return from travel.
> Can you please send me your test program so I can try to locally reproduce it?
Ok, thanks Ralph.
Start the DVM with: orte-dvm --report-uri
Hmmmok. I'll have to look at it this weekend when I return from travel.
Can you please send me your test program so I can try to locally reproduce
it?
On Thu, Oct 15, 2015 at 3:42 PM, Mark Santcroos
wrote:
>
> > On 16 Oct 2015, at 0:23 , Ralph Castain
> On 16 Oct 2015, at 0:23 , Ralph Castain wrote:
> Okay, that means that the dvm isn't recognizing that the jobs actually
> completed.
Ok.
> So the question is: what is it about those jobs?
They are all the same.
> Are those 6 jobs very short-lived, and the others are
Okay, that means that the dvm isn't recognizing that the jobs actually
completed. So the question is: what is it about those jobs? Are those 6
jobs very short-lived, and the others are longer-lived? If you look at the
nodes (before you kill the dvm), are any of those procs still there?
On Thu,
> On 16 Oct 2015, at 0:09 , Ralph Castain wrote:
>
> Help me out a bit - how many jobs did you actually run?
42 tasks in total, 6 stalled, 36 returned.
Help me out a bit - how many jobs did you actually run?
On Thu, Oct 15, 2015 at 2:33 PM, Mark Santcroos
wrote:
>
> > On 15 Oct 2015, at 17:25 , Ralph Castain wrote:
> >
> > Interesting - I see why. Please try this version.
>
> Ok, that works as
> On 15 Oct 2015, at 17:25 , Ralph Castain wrote:
>
> Interesting - I see why. Please try this version.
Ok, that works as expected.
I'll repeat the results with this version too:
$ grep TERMINATED dvm_output-patched.txt |wc -l
36
$ grep NOTIFYING
I'm trying to run OpenMPI in OSv container
(https://github.com/cloudius-systems/osv). It's a single process, single
address space VM, without fork, exec, openpty function. With some
butchering of OSv and OpenMPI I was able to compile orted.so, and run it
inside OSv via mpirun (mpirun is on remote
Interesting - I see why. Please try this version.
Ralph
On Thu, Oct 15, 2015 at 4:05 AM, Mark Santcroos
wrote:
>
> > On 15 Oct 2015, at 4:38 , Ralph Castain wrote:
> > Okay, please try the attached patch.
>
> *scratch*
>
> Although I reported
On 10/14/15 19:35, Jeff Squyres (jsquyres) wrote:
On Oct 14, 2015, at 12:48 PM, Nathan Hjelm wrote:
I think this is from a known issue. Try applying this and run again:
https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch
The good news is
> On 15 Oct 2015, at 4:38 , Ralph Castain wrote:
> Okay, please try the attached patch.
*scratch*
Although I reported results with the patch earlier, I can't reproduce it
anymore.
Now orte-dvm shuts down after the first orte-submit completes with:
[netbook:72038]
Another data point, this only seems to happen for really short tasks, i.e. < 1
sec.
Followup on https://github.com/open-mpi/ompi/pull/1028.
> On Oct 14, 2015, at 2:37 AM, Gilles Gouaillardet
> wrote:
>
> Folks,
>
> i was able to reproduce the issue by adding CPPFLAGS=-I/tmp to my
> configure command line.
> here is what happens :
>
Hi!
> On 15 Oct 2015, at 4:38 , Ralph Castain wrote:
>
> Okay, please try the attached patch. It will cause two messages to be output
> for each job: one indicating the job has been marked terminated, and the
> other reporting that the completion message was sent to the
19 matches
Mail list logo