On Mon, 1 Aug 2011 09:47:00 PM Xin He wrote:
> Do any of you guys have any testing programs that I should
> run to test if it really works?
How about a real MPI program which has test data to check
it's running OK ? Gromacs is open source and has a self-test
mechanism run via "make test" IIRC.
On Thu, 25 Aug 2011 09:07:48 PM Jayavant Patil wrote:
> Hi,
Hiya,
> Is anybody having a tutorial or reference pages
> explaining about the communication between Torque
> and MPI?
Open-MPI uses the PBS Task Manager (TM) API to talk to
Torque pbs_mom's. If you have the Torque manual pages
instal
On Thursday 09 February 2012 22:18:20 Jeff Squyres wrote:
> Just so that I understand this better -- if a process is bound in a
> cpuset, will tools like hwloc's lstopo only show the Linux
> processors *in that cpuset*? I.e., does it not have any
> visibility of the processors outside of its cpus
Hi all,
We've been working trying to track down an IB issue here where a
user was having code (Gromacs, run with OMPI 1.4.5) dieing with:
[[18115,1],2][btl_openib_component.c:3224:handle_wc] from bruce030 to: bruce130
error polling LP CQ with status
RETRY EXCEEDED ERROR status number 12 for wr_
On Tuesday 13 March 2012 10:06:43 Chris Samuel wrote:
> Those don't match the values compiled into OMPI 1.4.5:
>
> ompi_info -a | egrep
> 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout' MCA btl:
> parameter "btl_openib_ib_min_rnr_timer" (current v
Hi folks,
One of our users (oh, OK, our director, one of the Dalton developers)
has found an odd behaviour of OMPI 1.6.5 on our x86 clusters and has
managed to get a small reproducer - a modified version of the
ubiquitous F90 "hello world" MPI program.
We find that if we run this program (compile
On Wed, 4 Dec 2013 11:39:29 AM Jeff Squyres wrote:
> On Dec 3, 2013, at 7:54 PM,
> Christopher Samuel wrote:
>
> > Would it make any sense to expose system/environmental/thermal
> > information to the application via MPI_T ?
>
> Hmm. Interesting idea.
Phew. :-)
> Is the best way to grab such
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
> push upstream. However, to use it, it will require linking in a proprietary
> Mellanox library that accelerates the collective operations (available in
> MOFED versions
Hi all,
Not sure if this is a OpenMPI query or a PLPA query,
but given that PLPA seems to have some support for it
already I thought I'd start here. :-)
We run a quad core Opteron cluster with Torque 2.3.x
which uses the kernels cpuset support to constrain
a job to just the cores it has been allo
- "Ralph Castain" wrote:
Hi Ralph,
> Interesting. No, we don't take PLPA cpu sets into account when
> retrieving the allocation.
Understood.
> Just to be clear: from an OMPI perspective, I don't think this is an
> issue of binding, but rather an issue of allocation. If we knew we had
>
- "Ralph Castain" wrote:
> Could you check this? You can run a trivial job using the -npernode x
> option, where x matched the #cores you were allocated on the nodes.
> If you do this, do we bind to the correct cores?
Nope, I'm afraid it doesn't - submitted a job asking
for 4 cores on one
- "Ralph Castain" wrote:
> Looking at your command line, did you remember to set -mca
> mpi_paffinity_alone 1?
Ahh, no, sorry, still feeling my way with this..
> If not, we won't set affinity on the processes.
Now it fails immediately with:
Setting processor affinity failed
--> Ret
- "Ralph Castain" wrote:
> Sounds like a problem in PLPA - I'll have to defer
> to them.
Understood, thanks for that update. I'll try and
find some time to look inside PLPA too.
> Our primary PLPA person is on vacation this week, so
> you might not hear back from him until later next wee
- "Ralph Castain" wrote:
> Looking at your command line, did you remember to set -mca
> mpi_paffinity_alone 1? If not, we won't set affinity on the
> processes.
Just realised that in the failed test I posted I set
-mca mpi_affinity_alone 1 *instead* of -mca paffinity linux,
rather than as
- "Ralph Castain" wrote:
> This will tell you what module is loaded. If PLPA -can- run, you
> should see the linux module selected.
Thanks Ralph, yes it is being selected. I'll carry
on digging.
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partn
- "Chris Samuel" wrote:
> I'll carry on digging.
I've been trying to track back from the linux
paffinity module to find some useful debugging
info I can get my teeth into and I can see that
the file:
opal/mca/paffinity/base/paffinity_base_service.c
seems
- "Ralph Castain" wrote:
> Should just be
>
> -mca paffinity_base_verbose 5
>
> Any value greater than 4 should turn it "on"
Yup, that's what I was trying, but couldn't get any output.
> Something I should have mentioned. The paffinity_base_service.c file
> is solely used by the rank_f
- "Jeff Squyres" wrote:
Hi Jeff,
> I'm the "primary PLPA" guy that Ralph referred to, and I was on
> vacation last week -- sorry for missing all the chatter.
No worries!
> Based on your mails, it looks like you're out this week -- so little
> will likely occur. I'm at the MPI Forum st
Hi Bert,
- "Bert Wesarg" wrote:
> The Cpus_allowed* fields in /proc//status are the same as
> sched_getaffinity returns and the /proc//cpuset needs to be
> resolved, i.e. where is the cpuset fs mounted?
The convention is to mount it on /dev/cpuset.
Unfortunately you cannot mount both the c
- "Jeff Squyres" wrote:
> PLPA does not currently deal with cpusets.
I think it can get close enough if it assumes that
its initial affinity list is the subset of cores that
it can choose from when setting CPU affinity.
As for whether OMPI or PLPA should choose, I suspect
it's better if OM
Hi Ralph,
- "Ralph Castain" wrote:
> UmmmI'll let you guys work this out on PLPA. However, just to
> clarify, OMPI currently binds to cores, not logical cpus. It is the
> PLPA that is "dumb" and provides the plumbing to do what OMPI tells
> it.
>
> :-)
Ahh, if that's the case then
- "Ralph Castain" wrote:
> Perhaps a telecon (myself, Jeff S, and you) would be best at this
> stage.
Sounds good, will take that part to private email.
> I confess I'm now confused too - what you describe is precisely
> what we already do.
I added printf()'s to the PLPA init(),
PLPA_NA
- "Ralf Wildenhues" wrote:
> You can probably solve most of these issues by just
> versioning the directory names where you put the files;
To be honest I'm not sure if this is something that
OMPI should be looking to solve, we have lots of
different versions installed on our clusters just
u
- "Terry Dontje" wrote:
> I just wanted to give everyone a heads up if they do not get bugs
> email. I just submitted a CMR to move over some new paffinity options
> from the trunk to the v1.3 branch.
Ralphs comments imply that for those sites that share nodes
between jobs (such as oursel
- "Ralph Castain" wrote:
> Hi Chris
Hiya Ralph,
> There would be a "-do-not-bind" option that will prevent us from
> binding processes to anything which should cover that situation.
Gotcha.
> My point was only that we would be changing the out-of-the-box
> behavior to the opposite of tod
- "Eugene Loh" wrote:
> This is an important discussion.
Indeed! My big fear is that people won't pick up the significance
of the change and will complain about performance regressions
in the middle of an OMPI stable release cycle.
> Do note:
>
> 1) Bind-to-core is actually the default be
- "Jeff Squyres" wrote:
> An important point to raise here: the 1.3 series is *not* the super
> stable series. It is the *feature* series. Specifically: it is not
> out of scope to introduce or change features within the 1.3 series.
Ah, I think I've misunderstood the website then. :-(
- "Eugene Loh" wrote:
Hi Eugene,
[...]
> It would be even better to have binding selections adapt to other
> bindings on the system.
Indeed!
This touches on the earlier thread about making OMPI aware
of its cpuset/cgroup allocation on the node (for those sites
that are using it), it might
- "Ralph Castain" wrote:
> Hi Chris
Hiya,
> The devel trunk has all of this in it - you can get that tarball from
> the OMPI web site (take the nightly snapshot).
OK, grabbed that (1.4a1r21825). Configured with:
./configure --prefix=$FOO --with-openib --with-tm=/usr/
local/torque/latest
- "Eugene Loh" wrote:
> Ah, you're missing the third secret safety switch that prevents
> hapless mortals from using this stuff accidentally! :^)
Sounds good to me. :-)
> I think you need to add
>
> --mca opal_paffinity_alone 1
Yup, looks like that's it; it fails to launch with tha
- "Chris Samuel" wrote:
> This is most likely because it's getting an error from the
> kernel when trying to bind to a socket it's not permitted
> to access.
This is what strace reports:
18561 sched_setaffinity(18561, 8, { f0 }
18561 <... sched_se
- "Chris Samuel" wrote:
> $ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca
> odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4
To clarify - does that command line accurately reflect the
proposed defaults for OMPI 1.3.4 ?
cheers,
Chris
- "Eugene Loh" wrote:
> Actually, the current proposed defaults for 1.3.4 are
> not to change the defaults at all.
Thanks, I hadn't picked up on the latest update to the
trac ticket 3 days ago that says that the defaults will
stay the same. Sounds good to me!
All the best and have a good w
- "Jeff Squyres" wrote:
> Does anyone have any suggestions? Or are we stuck
> with compile-time checking?
I didn't see this until now, but I'd be happy with
just a compile time option so we could produce an
install just for debugging purposes and have our
users explicitly select it with mo
- "George Bosilca" wrote:
> Do people know that there exist tools for checking MPI code
> correctness? Many, many tools and most of them are freely
> available.
Yes, but have yet to be able to persuade any of our users
to use them (and have no control over them). :-(
--
Christopher Samu
- "Ralph Castain" wrote:
> Hi Greg
>
> I fixed these so they will get properly formatted. However, it is
> symptomatic of a much broader problem - namely, that developers have
> inserted print statements throughout the code for reporting errors.
> There simply isn't any easy way for me to c
- "David Robertson" wrote:
> Hi all,
Hiya,
> We use both the PGI and Intel compilers over an
> Infiniband cluster and I was trying to find a way
> to have both orteruns in the path (in separate
> directories) at the same time.
Not a solution, but what we do here is to arrange our
installs
- "Ralph Castain" wrote:
> Let me point out the obvious since this has plagued
> us at LANL with regard to this concept. If a user
> wants to do something different, all they have to
> do is download and build their own copy of OMPI.
One possibility may be to have OMPI honour a config
file
Hi Edgar,
- "Edgar Gabriel" wrote:
> just wanted to give a heads-up that I *think* I know what the problem
> is. I should have a fix (with a description) either later today or
> tomorrow morning...
I see that changeset 21970 is on trunk to fix this issue,
is that backportable to the 1.3.x
Hi Edgar,
- "Edgar Gabriel" wrote:
> it will be available in 1.3.4...
That's great, thanks so much!
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-
Hi Terry,
- "Terry Dontje" wrote:
> It's actually is in the 1.3 branch now and has been
> verified to solve the hanging issues of several members.
Great, I'll get them to try a snapshot build!
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnershi
Hi folks,
Just wondered if there was any idea of when OMPI 1.3.4
might be released ? I know the correct answer is "when
it's ready" (:-)) but was curious if there was any thoughts
on a timeframe ?
The cpuset aware CPU affinity code would be very useful
to us to fix up some codes that sometimes g
- "David Singleton" wrote:
> Kiril Dichev has already pointed a problem with MPI_Cart_create
> http://www.open-mpi.org/community/lists/devel/2009/08/6627.php
> MPI_Graph_create has the same problem. I checked all other
> functions with logical in arguments and no others do anything
> simila
- "Jeff Squyres" wrote:
> Give it a whirl:
Nice - built without warnings with GCC 4.4.2.
Some sample results below for configs not represented
on the current website.
Dual socket Shanghai:
System(31GB)
Node#0(15GB) + Socket#0 + L3(6144KB)
L2(512KB) + L1(64KB) + Core#0 + P#0
L2
- "Chris Samuel" wrote:
> Some sample results below for configs not represented
> on the current website.
A final example of a more convoluted configuration with
a Torque job requesting 5 CPUs on a dual Shanghai node
and has been given a non-contiguous configuration.
[
- "Jeff Squyres" wrote:
> Sweet!
:-)
> And -- your reply tells me that, for the 2nd time in a single day, I
> posted to the wrong list. :-)
Ah well, if you'd posted to the right list I wouldn't
have seen this.
> I'll forward your replies to the hwloc-devel list.
Not a problem - I'll g
- "Tony Breeds" wrote:
> Powerpc kernels that old do not have the topology information needed
> (in /sys or /proc/cpuinfo) So for the short term that's be best we
> can do.
That's fine, I quite understand. I'm trying to get that
cluster replaced anyway.. ;-)
> FWIW I'm looking at how we
- "Ashley Pittman" wrote:
> $ grep Cpus_allowed_list /proc/$$/status
Useful, ta!
> Does this imply the default is to report on processes
> in the current cpuset rather than the entire system?
> Does anyone else feel that violates the principal of
> least surprise?
Not really, I feel that
- "Jeff Squyres" wrote:
> WHAT: Change OMPI's verbose build output to use
> Automake's new "silent" rules output (see below)
Not that I'm a developer, but as someone who regularly
builds OMPI I'd be very happy with this change.
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - System
- "Jeff Squyres" wrote:
> You are absolutely correct. I've filed CMRs
> for v1.4 and v1.5.
To clarify one point for people who weren't at the
SC'09 Open-MPI BOF (hopefully I'll get this right!):
1.4 will be the bug-fix only continuation of the 1.3
feature series and will be binary compati
map().
Had great fun with that trying to track down why the mem property of Torque
PBS jobs wasn't being enforced all the time.
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For mor
r lists as something they
should look at doing better:
http://old.nabble.com/Re:-Can't-use-Fortran-90-95-compiler-for-F77-p26209677.html
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
erested (though they will
likely ask for a bisection too).
cheers!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
9616384 93.8 1.1 16 81.1 91.5
# end result "Pingpong_Send_Recv"
# duration = 0.02 sec
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Hi folks,
Looking at the code for the Twitter notifier in OMPI 1.5
and seeing its use of HTTP basic authentication I would
suggest that it may be non-functional due to Twitters
switch to purely OAUTH based authentication for their API.
I'm trying to test it out here but I'm at a bit of a loss
to
Hi folks,
I was looking to file an issue against the website for the FAQ about XRC
support (given it was disabled in issue #4087) but it doesn't appear to be
enabled. Is that just an oversight or is there a different way preferred?
All the best,
Chris
--
Christopher SamuelSenior Sys
On Sunday, 5 November 2017 12:07:52 AM AEDT r...@open-mpi.org wrote:
> It was just an oversight - I have turned on the issue tracker, so feel free
> to post, or a PR is also welcome
Thanks Ralph! Unfortunately I don't feel I understand enough about
background of the issue to say much more abou
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:
> Ah -- the point being that this is not an issue related to the libltdl work.
Sorry - I saw the request to test the tarball and tried it out, missed the
significance of the subject. :-/
--
Christopher SamuelSenior Systems Administrat
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote:
> Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds):
>
> $ sysctl net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_time = 1800
I suspect that's a local customisation, all Linux systems I've got access to
(including RHEL 6.4/6.5/
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote:
> I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man"
> didn't find any matches).
This seems to document it for an unspecified version of Solaris:
http://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdg/index.html
Hi Gilles,
On Mon, 13 Jul 2015 03:16:57 PM Gilles Gouaillardet wrote:
> i made ConnectX XRC (aka XRC) and ConnectIb XRC (aka XRC domains) exclusive,
> so yes, you got the desired behavior.
Is there a tarball I could test on our x86 systems please? We are tied to the
OFED in RHEL due to having
On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote:
> Hi Chris,
Hi Gilles,
> i pushed my tarball into a gist :
Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa
2010, one circa 2013) with their included OFED I see:
checking if ConnectX XRC support is enabled...
62 matches
Mail list logo