On Sunday, 5 November 2017 12:07:52 AM AEDT r...@open-mpi.org wrote:
> It was just an oversight - I have turned on the issue tracker, so feel free
> to post, or a PR is also welcome
Thanks Ralph! Unfortunately I don't feel I understand enough about
background of the issue to say much more
Hi folks,
I was looking to file an issue against the website for the FAQ about XRC
support (given it was disabled in issue #4087) but it doesn't appear to be
enabled. Is that just an oversight or is there a different way preferred?
All the best,
Chris
--
Christopher SamuelSenior
On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote:
> Hi Chris,
Hi Gilles,
> i pushed my tarball into a gist :
Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa
2010, one circa 2013) with their included OFED I see:
checking if ConnectX XRC support is enabled...
Hi Gilles,
On Mon, 13 Jul 2015 03:16:57 PM Gilles Gouaillardet wrote:
> i made ConnectX XRC (aka XRC) and ConnectIb XRC (aka XRC domains) exclusive,
> so yes, you got the desired behavior.
Is there a tarball I could test on our x86 systems please? We are tied to the
OFED in RHEL due to having
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote:
> I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man"
> didn't find any matches).
This seems to document it for an unspecified version of Solaris:
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote:
> Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds):
>
> $ sysctl net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_time = 1800
I suspect that's a local customisation, all Linux systems I've got access to
(including RHEL
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:
> Ah -- the point being that this is not an issue related to the libltdl work.
Sorry - I saw the request to test the tarball and tried it out, missed the
significance of the subject. :-/
--
Christopher SamuelSenior Systems
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
> push upstream. However, to use it, it will require linking in a proprietary
> Mellanox library that accelerates the collective operations (available in
> MOFED
On Wed, 4 Dec 2013 11:39:29 AM Jeff Squyres wrote:
> On Dec 3, 2013, at 7:54 PM,
> Christopher Samuel wrote:
>
> > Would it make any sense to expose system/environmental/thermal
> > information to the application via MPI_T ?
>
> Hmm. Interesting idea.
Phew. :-)
> Is
Hi folks,
One of our users (oh, OK, our director, one of the Dalton developers)
has found an odd behaviour of OMPI 1.6.5 on our x86 clusters and has
managed to get a small reproducer - a modified version of the
ubiquitous F90 "hello world" MPI program.
We find that if we run this program
On Wednesday 25 April 2012 19:38:00 Brice Goglin wrote:
> How do people feel about this?
It sounds like what you have is a conflict between the policies of
Debian (and hence Ubuntu) and the expectations of RHEL/CentOS users.
Debian Policy is fairly clear on this matter:
# 11.8.1 Providing X
On Tuesday 13 March 2012 10:06:43 Chris Samuel wrote:
> Those don't match the values compiled into OMPI 1.4.5:
>
> ompi_info -a | egrep
> 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout' MCA btl:
> parameter "btl_openib_ib_min_rnr_timer" (current value: "25&
Hi all,
We've been working trying to track down an IB issue here where a
user was having code (Gromacs, run with OMPI 1.4.5) dieing with:
[[18115,1],2][btl_openib_component.c:3224:handle_wc] from bruce030 to: bruce130
error polling LP CQ with status
RETRY EXCEEDED ERROR status number 12 for
On Thursday 09 February 2012 22:18:20 Jeff Squyres wrote:
> Just so that I understand this better -- if a process is bound in a
> cpuset, will tools like hwloc's lstopo only show the Linux
> processors *in that cpuset*? I.e., does it not have any
> visibility of the processors outside of its
On Mon, 1 Aug 2011 09:47:00 PM Xin He wrote:
> Do any of you guys have any testing programs that I should
> run to test if it really works?
How about a real MPI program which has test data to check
it's running OK ? Gromacs is open source and has a self-test
mechanism run via "make test" IIRC.
4 93.8 1.1 16 81.1 91.5
# end result "Pingpong_Send_Recv"
# duration = 0.02 sec
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
they will
likely ask for a bisection too).
cheers!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
(so far) describing things is in terms of sockets and cores.
Wouldn't be surprised if someone pointed out an ambiguity in those too!
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: ht
as something they
should look at doing better:
http://old.nabble.com/Re:-Can't-use-Fortran-90-95-compiler-for-F77-p26209677.html
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
.
Had great fun with that trying to track down why the mem property of Torque
PBS jobs wasn't being enforced all the time.
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see
- "Jeff Squyres" wrote:
> You are absolutely correct. I've filed CMRs
> for v1.4 and v1.5.
To clarify one point for people who weren't at the
SC'09 Open-MPI BOF (hopefully I'll get this right!):
1.4 will be the bug-fix only continuation of the 1.3
feature series and
- "Jeff Squyres" wrote:
> A lot of these are "unreferenced parameters" which
> I think we should clean up someday, but not today. ;-)
Fair comment. ;-)
> The stdc99/stgnu99 one is worth looking at -- probably not for this
> release, but it does seem like we should
- "Samuel Thibault" wrote:
> What do you mean by "module support"?
http://modules.sourceforge.net/
They make managing multiple software installations
on clusters much much easier..
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian
- "Jeff Squyres" wrote:
> Please beat it up!
Compiles fine with PGI 10.0 and GCC 4.4.2, but we are getting
warnings with Intel 11.1 for all files saying:
icc: command line warning #10121: overriding '-stdc99' with '-stdgnu99'
Hi Michael,
- "Michael Raymond" wrote:
> Our architecture has blades with two Nehalems on
> them, and the blades are connected together in a
> CC-NUMA fashion.
I've heard on the grapevine that there will be memory
only blades too, which will have a Nehalem EX on them
but
- "Jeff Squyres" wrote:
> K. Clear for a final rc / release?
Go for it, am just about to go run a training course
now so won't be available until this arvo Melbourne
time..
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian
- "Chris Samuel" <csam...@vpac.org> wrote:
> Will try PGI 7.0 now.
I can confirm it compiles OK with PGI 7.0, 7.1 and 7.2
with the same warnings as for 8.0.
These warnings also appear with 9.0.
Lots of warnings from the Intel v11 compilers,
I've attached a text file
- "Chris Samuel" <csam...@vpac.org> wrote:
> Grabbing now, thanks!
Compiled OK with:
[csamuel@tango hwloc-0.9.1rc3r1276]$ pgcc -V
pgcc 8.0-6 64-bit target on x86-64 Linux -tp gh-64
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2009,
- "Jeff Squyres" wrote:
> Try this tarball:
>
> http://www.open-mpi.org/~jsquyres/unofficial/hwloc-0.9.1rc3r1276.tar.bz2
Grabbing now, thanks!
Sorry for not seeing the email yesterday, it was a
public holiday here yesterday (Melbourne Cup Day, yes
we have a public
- "Jeff Squyres (jsquyres)" wrote:
> Pgcc issues fixed?
Sorry folks, have not yet got the SVN checkout to configure
yet due to it requiring newer tools than I have and am buried
trying to get board reports out at present.
Hopefully will have some time tomorrow 2-3pm my
- "Pavan Balaji" wrote:
> Log files attached.
Hmm, it compiled for me with pgcc!
[csamuel@tango hwloc-0.9.1rc3]$ pgcc -V
pgcc 9.0-4 64-bit target on x86-64 Linux -tp shanghai-64
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2009,
- "Jeff Squyres" wrote:
> I tweaked this a bit -- how's this:
Looks good to me!
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit
- "Jeff Squyres" wrote:
> WHAT: Change OMPI's verbose build output to use
> Automake's new "silent" rules output (see below)
Not that I'm a developer, but as someone who regularly
builds OMPI I'd be very happy with this change.
cheers!
Chris
--
Christopher Samuel -
- "Ashley Pittman" wrote:
> $ grep Cpus_allowed_list /proc/$$/status
Useful, ta!
> Does this imply the default is to report on processes
> in the current cpuset rather than the entire system?
> Does anyone else feel that violates the principal of
> least surprise?
- "Tony Breeds" wrote:
> Powerpc kernels that old do not have the topology information needed
> (in /sys or /proc/cpuinfo) So for the short term that's be best we
> can do.
That's fine, I quite understand. I'm trying to get that
cluster replaced anyway.. ;-)
>
- "Jeff Squyres" wrote:
> Sweet!
:-)
> And -- your reply tells me that, for the 2nd time in a single day, I
> posted to the wrong list. :-)
Ah well, if you'd posted to the right list I wouldn't
have seen this.
> I'll forward your replies to the hwloc-devel list.
- "Chris Samuel" <csam...@vpac.org> wrote:
> Some sample results below for configs not represented
> on the current website.
A final example of a more convoluted configuration with
a Torque job requesting 5 CPUs on a dual Shanghai node
and has been given a non-co
- "Jeff Squyres" wrote:
> Give it a whirl:
Nice - built without warnings with GCC 4.4.2.
Some sample results below for configs not represented
on the current website.
Dual socket Shanghai:
System(31GB)
Node#0(15GB) + Socket#0 + L3(6144KB)
L2(512KB) + L1(64KB)
- "David Singleton" wrote:
> Kiril Dichev has already pointed a problem with MPI_Cart_create
> http://www.open-mpi.org/community/lists/devel/2009/08/6627.php
> MPI_Graph_create has the same problem. I checked all other
> functions with logical in arguments and
Hi folks,
Just wondered if there was any idea of when OMPI 1.3.4
might be released ? I know the correct answer is "when
it's ready" (:-)) but was curious if there was any thoughts
on a timeframe ?
The cpuset aware CPU affinity code would be very useful
to us to fix up some codes that sometimes
Hi Terry,
- "Terry Dontje" wrote:
> It's actually is in the 1.3 branch now and has been
> verified to solve the hanging issues of several members.
Great, I'll get them to try a snapshot build!
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
Hi Edgar,
- "Edgar Gabriel" wrote:
> it will be available in 1.3.4...
That's great, thanks so much!
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053,
Hi Edgar,
- "Edgar Gabriel" wrote:
> just wanted to give a heads-up that I *think* I know what the problem
> is. I should have a fix (with a description) either later today or
> tomorrow morning...
I see that changeset 21970 is on trunk to fix this issue,
is that
- "Ralph Castain" wrote:
> Let me point out the obvious since this has plagued
> us at LANL with regard to this concept. If a user
> wants to do something different, all they have to
> do is download and build their own copy of OMPI.
One possibility may be to have OMPI
- "David Robertson" wrote:
> Hi all,
Hiya,
> We use both the PGI and Intel compilers over an
> Infiniband cluster and I was trying to find a way
> to have both orteruns in the path (in separate
> directories) at the same time.
Not a solution, but what we do
- "Jeff Squyres" wrote:
> Does anyone have any suggestions? Or are we stuck
> with compile-time checking?
I didn't see this until now, but I'd be happy with
just a compile time option so we could produce an
install just for debugging purposes and have our
users
- "Eugene Loh" wrote:
> Actually, the current proposed defaults for 1.3.4 are
> not to change the defaults at all.
Thanks, I hadn't picked up on the latest update to the
trac ticket 3 days ago that says that the defaults will
stay the same. Sounds good to me!
All the
- "Chris Samuel" <csam...@vpac.org> wrote:
> $ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca
> odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4
To clarify - does that command line accurately reflect the
proposed defaults for OMPI
- "Chris Samuel" <csam...@vpac.org> wrote:
> This is most likely because it's getting an error from the
> kernel when trying to bind to a socket it's not permitted
> to access.
This is what strace reports:
18561 sched_setaffinity(18561, 8, { f0 }
18561 <.
- "Eugene Loh" wrote:
> Ah, you're missing the third secret safety switch that prevents
> hapless mortals from using this stuff accidentally! :^)
Sounds good to me. :-)
> I think you need to add
>
> --mca opal_paffinity_alone 1
Yup, looks like that's it; it
- "Ralph Castain" wrote:
> Hi Chris
Hiya,
> The devel trunk has all of this in it - you can get that tarball from
> the OMPI web site (take the nightly snapshot).
OK, grabbed that (1.4a1r21825). Configured with:
./configure --prefix=$FOO --with-openib --with-tm=/usr/
- "Eugene Loh" wrote:
Hi Eugene,
[...]
> It would be even better to have binding selections adapt to other
> bindings on the system.
Indeed!
This touches on the earlier thread about making OMPI aware
of its cpuset/cgroup allocation on the node (for those sites
that
- "Jeff Squyres" wrote:
> An important point to raise here: the 1.3 series is *not* the super
> stable series. It is the *feature* series. Specifically: it is not
> out of scope to introduce or change features within the 1.3 series.
Ah, I think I've misunderstood
- "Eugene Loh" wrote:
> This is an important discussion.
Indeed! My big fear is that people won't pick up the significance
of the change and will complain about performance regressions
in the middle of an OMPI stable release cycle.
> Do note:
>
> 1) Bind-to-core is
- "Ralph Castain" wrote:
> Hi Chris
Hiya Ralph,
> There would be a "-do-not-bind" option that will prevent us from
> binding processes to anything which should cover that situation.
Gotcha.
> My point was only that we would be changing the out-of-the-box
> behavior to
- "Terry Dontje" wrote:
> I just wanted to give everyone a heads up if they do not get bugs
> email. I just submitted a CMR to move over some new paffinity options
> from the trunk to the v1.3 branch.
Ralphs comments imply that for those sites that share nodes
- "Ralf Wildenhues" wrote:
> You can probably solve most of these issues by just
> versioning the directory names where you put the files;
To be honest I'm not sure if this is something that
OMPI should be looking to solve, we have lots of
different versions
- "Ralph Castain" wrote:
> Perhaps a telecon (myself, Jeff S, and you) would be best at this
> stage.
Sounds good, will take that part to private email.
> I confess I'm now confused too - what you describe is precisely
> what we already do.
I added printf()'s to the
Hi Ralph,
- "Ralph Castain" wrote:
> UmmmI'll let you guys work this out on PLPA. However, just to
> clarify, OMPI currently binds to cores, not logical cpus. It is the
> PLPA that is "dumb" and provides the plumbing to do what OMPI tells
> it.
>
> :-)
Ahh, if
Hi Bert,
- "Bert Wesarg" wrote:
> The Cpus_allowed* fields in /proc//status are the same as
> sched_getaffinity returns and the /proc//cpuset needs to be
> resolved, i.e. where is the cpuset fs mounted?
The convention is to mount it on /dev/cpuset.
- "Jeff Squyres" wrote:
Hi Jeff,
> I'm the "primary PLPA" guy that Ralph referred to, and I was on
> vacation last week -- sorry for missing all the chatter.
No worries!
> Based on your mails, it looks like you're out this week -- so little
> will likely occur.
- "Ralph Castain" wrote:
> Should just be
>
> -mca paffinity_base_verbose 5
>
> Any value greater than 4 should turn it "on"
Yup, that's what I was trying, but couldn't get any output.
> Something I should have mentioned. The paffinity_base_service.c file
> is
- "Ralph Castain" wrote:
> This will tell you what module is loaded. If PLPA -can- run, you
> should see the linux module selected.
Thanks Ralph, yes it is being selected. I'll carry
on digging.
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
- "Ralph Castain" wrote:
> Looking at your command line, did you remember to set -mca
> mpi_paffinity_alone 1? If not, we won't set affinity on the
> processes.
Just realised that in the failed test I posted I set
-mca mpi_affinity_alone 1 *instead* of -mca paffinity
- "Ralph Castain" wrote:
> Sounds like a problem in PLPA - I'll have to defer
> to them.
Understood, thanks for that update. I'll try and
find some time to look inside PLPA too.
> Our primary PLPA person is on vacation this week, so
> you might not hear back from him
- "Ralph Castain" wrote:
> Looking at your command line, did you remember to set -mca
> mpi_paffinity_alone 1?
Ahh, no, sorry, still feeling my way with this..
> If not, we won't set affinity on the processes.
Now it fails immediately with:
Setting processor
- "Ralph Castain" wrote:
> Could you check this? You can run a trivial job using the -npernode x
> option, where x matched the #cores you were allocated on the nodes.
> If you do this, do we bind to the correct cores?
Nope, I'm afraid it doesn't - submitted a job asking
- "Ralph Castain" wrote:
Hi Ralph,
> Interesting. No, we don't take PLPA cpu sets into account when
> retrieving the allocation.
Understood.
> Just to be clear: from an OMPI perspective, I don't think this is an
> issue of binding, but rather an issue of allocation.
Hi all,
Not sure if this is a OpenMPI query or a PLPA query,
but given that PLPA seems to have some support for it
already I thought I'd start here. :-)
We run a quad core Opteron cluster with Torque 2.3.x
which uses the kernels cpuset support to constrain
a job to just the cores it has been
69 matches
Mail list logo