Fair enough, and I'm not disagreeing. Just indicating why I tend to prefer the
run-time solutions (although we all agree that in this case, a run-time
detection would run into a million corner cases, and therefore probably isn't
worth it).
> On Jan 25, 2016, at 5:55 PM, Ralph Castain wrote:
Thanks Paul,
it seems a "git add" was missed in the upstream pmix repo,
i will make a PR for that
Cheers,
Gilles
On 1/26/2016 9:50 AM, Paul Hargrove wrote:
Using last night's master tarball I am seeing the following at
configure time:
[path-to]/openmpi-dev-3397-g70787d1/opal/mca/pmix/pmix12
Using last night's master tarball I am seeing the following at configure
time:
[path-to]/openmpi-dev-3397-g70787d1/opal/mca/pmix/pmix120/pmix/configure:
line 19364: PMIX_CHECK_ICC_VARARGS: command not found
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Computer Languages
I guess what I was aiming at was something similar to what we are all
converging upon. People don't really care about all the details of what
mapper components were built etc. What they really need to know is: (a)
what resource manager support was built, and (b) what fabrics.
So a very simple, sho
Jeff,
Excellent point about the --with-foo behavior.
If an admin knows what component name to grep for then they should
"--with-foo" that component.
With language bindings the spelling is "--enable-mpi-foo", but the
principle is the same.
Adding new places to apply grep is entirely superfluous if
My concern with the runtime solution is that I fear we will suffer the
death by a thousand cuts as we try to navigate our way around all the odd
configurations that exist out there. What I don't want to do is get into a
constant game of whack-a-mole where we are trying to only emit the warning
when
I'd like to point out an offhand comment that I made earlier that seems to have
gotten lost -- let me cite the README, because it cites it much better than I
did earlier in this thread:
-
Note that for many of Open MPI's --with- options, Open MPI will,
by default, search for header files and
I don't know if I agree here.
1. Everything about what configure is going to do is already sent to stdout
(yes, I know, people don't look at things that scroll off their screen). But
the point is -- if they want to grep for output, they can already do so.
2. I think it would be incredible chal
HI Folks,
I like Paul's suggestion for configury summary output a lot. It would have
helped me when I was trying to deal with an oddball
one-off install of the moab/torque software on one of the non-standard
front ends at LANL. The libfabric configury has
such a summary output at the end of conf
Ah, Nathan read my mind!
This is (more or less) what I suggest in the post I was typing when
Nathan's post arrived.
-Paul
On Mon, Jan 25, 2016 at 2:13 PM, Nathan Hjelm wrote:
>
> Another thing that might be useful is at the end of configure print out
> a list of each framework with a list of co
Ralph,
As a practical matter most users probably aren't going to know what to do
with anything that scrolls off their screen.
So I think dumping the ompi_info output as-is would be just "noise" to many
folks.
That is one reason I didn't just suggest doing exactly that
(cross-compilation being anot
Another thing that might be useful is at the end of configure print out
a list of each framework with a list of components and some build info
(static vs dynamic, etc). Something like:
plm:
alps (dynamic)
rsh (dynamic)
tm (dynamic)
-Nathan
On Mon, Jan 25, 2016 at 01:46:44PM -0800, Ralph C
That makes sense, Paul - what if we output effectively the ompi_info
summary of what was built at the end of the make install procedure? Then
you would have immediate feedback on the result.
On Mon, Jan 25, 2016 at 1:27 PM, Paul Hargrove wrote:
> As one who builds other people's software frequen
As one who builds other people's software frequently, I have my own
opinions here.
Above all else, is that there is no one "right" answer, but that
consistency with in a product is best.
So (within reason) the same things that work to configure module A and B
should work with C and D as well.
To u
Just remember: you don't have to put "with-tm" _if_ the Torque
includes/libs are in a standard location. The root problem here is that
they weren't in a standard location, and so we had to have "with-tm" in
order to find them.
On Mon, Jan 25, 2016 at 11:13 AM, Jeff Squyres (jsquyres) <
jsquy...@ci
Haters gotta hate. ;-)
Kidding aside, ok, you make valid points. So -- no tm "addition". We just
have to rely on people using functionality like "--with-tm" in the configure
line to force/ensure that tm (or whatever feature) will actually get built.
> On Jan 25, 2016, at 1:31 PM, Ralph Cast
I think we would be opening a real can of worms with this idea. There are
environments, for example, that use PBSPro for one part of the system
(e.g., IO nodes), but something else for the compute section.
Personally, I'd rather follow Howard's suggestion.
On Mon, Jan 25, 2016 at 10:21 AM, Nathan
On Mon, Jan 25, 2016 at 05:55:20PM +, Jeff Squyres (jsquyres) wrote:
> Hmm. I'm of split mind here.
>
> I can see what Howard is saying here -- adding complexity is usually a bad
> thing.
>
> But we have gotten these problem reports multiple times over the years:
> someone *thinking* that
Hmm. I'm of split mind here.
I can see what Howard is saying here -- adding complexity is usually a bad
thing.
But we have gotten these problem reports multiple times over the years: someone
*thinking* that they have built with launcher support X (e.g., TM, LSF), but
then figuring out later t
Yes, --mca btl tcp,self always used. We found the problem, we have
restricted the interfaces with --mca btl_tcp_if_include eth0 and now we are
at same performance (actually it seems that multiple orteds case is
slightly faster). I think there is some mess with other interfaces, however
I cannot fig
I also assumed that was true. However, when communicating between two
procs, the TCP stack will use a shortcut in the loopback code if the two
procs are known to be on the same node. In the case of multiple orteds, it
isn't clear to me that the stack knows this situation as the orteds, at
least, mu
Hi Gilles
I would prefer improving the faq rather than adding yet more complexity in
this area. The way things go you would add this feature then someone else
with a different use case would complain we had broken something for them.
Then we would add another mca param to disable the new tm less
Though I did not repeat it, I assumed --mca btl tcp,self is always used, as
described in the initial email
Cheers,
Gilles
On Monday, January 25, 2016, Ralph Castain wrote:
> I believe the performance penalty will still always be greater than zero,
> however, as the TCP stack is smart enough to
Ok, thank you Ralph and Gilles, I will continue testing and I'll update you
if there is any news.
Cheers,
Federico
2016-01-25 14:23 GMT+01:00 Ralph Castain :
> I believe the performance penalty will still always be greater than zero,
> however, as the TCP stack is smart enough to take an optimi
I believe the performance penalty will still always be greater than zero,
however, as the TCP stack is smart enough to take an optimized path when
doing a loopback as opposed to inter-node communication.
On Mon, Jan 25, 2016 at 4:28 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
Federico,
I did not expect 0% degradation, since you are now comparing two different
cases
1 orted means tasks are bound on sockets
16 orted means tasks are not bound.
a quick way to improve things is to use a wrapper that binds MPI tasks
mpirun --bind-to none wrapper.sh skampi
wrapper.sh can us
Thank you Gilles, you're right, with --bind-to none we have ~ 15% of
degradation rather than 50%.
It's much better now, but I think it should be (in theory) around 0%.
The benchmark is MPI bound (the standard benchmark provided with SkaMPI),
it tests these functions: MPI_Bcast, MPI_Barrier, MPI_Re
Federico,
unless you already took care of that, I would guess all 16 orted
bound their children MPI tasks on socket 0
can you try
mpirun --bind-to none ...
btw, is your benchmark application cpu bound ? memory bound ? MPI bound ?
Cheers,
Gilles
On Monday, January 25, 2016, Federico Reghenzani
Hello,
we have executed a benchmark (SkaMPI) on the same machine (32 core Intel
Xeon 86_64) with these two configurations:
- 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp)
- 16 orted with each 1 process (that uses TCP)
We use a custom RAS to allow multiple orted on the same
29 matches
Mail list logo