Re: [OMPI devel] Enable issue tracker for ompi-www repo?

2017-11-04 Thread Chris Samuel
On Sunday, 5 November 2017 12:07:52 AM AEDT r...@open-mpi.org wrote: > It was just an oversight - I have turned on the issue tracker, so feel free > to post, or a PR is also welcome Thanks Ralph! Unfortunately I don't feel I understand enough about background of the issue to say much more

[OMPI devel] Enable issue tracker for ompi-www repo?

2017-11-04 Thread Chris Samuel
Hi folks, I was looking to file an issue against the website for the FAQ about XRC support (given it was disabled in issue #4087) but it doesn't appear to be enabled. Is that just an oversight or is there a different way preferred? All the best, Chris -- Christopher SamuelSenior

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Chris Samuel
On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote: > Hi Chris, Hi Gilles, > i pushed my tarball into a gist : Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa 2010, one circa 2013) with their included OFED I see: checking if ConnectX XRC support is enabled...

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Chris Samuel
Hi Gilles, On Mon, 13 Jul 2015 03:16:57 PM Gilles Gouaillardet wrote: > i made ConnectX XRC (aka XRC) and ConnectIb XRC (aka XRC domains) exclusive, > so yes, you got the desired behavior. Is there a tarball I could test on our x86 systems please? We are tied to the OFED in RHEL due to having

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote: > I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man" > didn't find any matches). This seems to document it for an unspecified version of Solaris:

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote: > Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds): > > $ sysctl net.ipv4.tcp_keepalive_time > net.ipv4.tcp_keepalive_time = 1800 I suspect that's a local customisation, all Linux systems I've got access to (including RHEL

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Chris Samuel
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote: > Ah -- the point being that this is not an issue related to the libltdl work. Sorry - I saw the request to test the tarball and tried it out, missed the significance of the subject. :-/ -- Christopher SamuelSenior Systems

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Chris Samuel
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote: > We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually > push upstream. However, to use it, it will require linking in a proprietary > Mellanox library that accelerates the collective operations (available in > MOFED

Re: [OMPI devel] SC13 birds of a feather

2013-12-04 Thread Chris Samuel
On Wed, 4 Dec 2013 11:39:29 AM Jeff Squyres wrote: > On Dec 3, 2013, at 7:54 PM, > Christopher Samuel wrote: > > > Would it make any sense to expose system/environmental/thermal > > information to the application via MPI_T ? > > Hmm. Interesting idea. Phew. :-) > Is

[OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-08-28 Thread Chris Samuel
Hi folks, One of our users (oh, OK, our director, one of the Dalton developers) has found an odd behaviour of OMPI 1.6.5 on our x86 clusters and has managed to get a small reproducer - a modified version of the ubiquitous F90 "hello world" MPI program. We find that if we run this program

Re: [hwloc-devel] lstopo-nox strikes back

2012-04-25 Thread Chris Samuel
On Wednesday 25 April 2012 19:38:00 Brice Goglin wrote: > How do people feel about this? It sounds like what you have is a conflict between the policies of Debian (and hence Ubuntu) and the expectations of RHEL/CentOS users. Debian Policy is fairly clear on this matter: # 11.8.1 Providing X

Re: [OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
On Tuesday 13 March 2012 10:06:43 Chris Samuel wrote: > Those don't match the values compiled into OMPI 1.4.5: > > ompi_info -a | egrep > 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout' MCA btl: > parameter "btl_openib_ib_min_rnr_timer" (current value: "25&

[OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
Hi all, We've been working trying to track down an IB issue here where a user was having code (Gromacs, run with OMPI 1.4.5) dieing with: [[18115,1],2][btl_openib_component.c:3224:handle_wc] from bruce030 to: bruce130 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Chris Samuel
On Thursday 09 February 2012 22:18:20 Jeff Squyres wrote: > Just so that I understand this better -- if a process is bound in a > cpuset, will tools like hwloc's lstopo only show the Linux > processors *in that cpuset*? I.e., does it not have any > visibility of the processors outside of its

Re: [OMPI devel] [TIPC BTL] test programmes

2011-08-01 Thread Chris Samuel
On Mon, 1 Aug 2011 09:47:00 PM Xin He wrote: > Do any of you guys have any testing programs that I should > run to test if it really works? How about a real MPI program which has test data to check it's running OK ? Gromacs is open source and has a self-test mechanism run via "make test" IIRC.

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-11 Thread Chris Samuel
4 93.8 1.1 16 81.1 91.5 # end result "Pingpong_Send_Recv" # duration = 0.02 sec cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-11 Thread Chris Samuel
they will likely ask for a bisection too). cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [hwloc-devel] Strange difference

2010-03-30 Thread Chris Samuel
(so far) describing things is in terms of sockets and cores. Wouldn't be surprised if someone pointed out an ambiguity in those too! cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: ht

Re: [OMPI devel] RFC: Adding -DOPEN_MPI=1 to mpif77 and mpif90

2010-02-11 Thread Chris Samuel
as something they should look at doing better: http://old.nabble.com/Re:-Can't-use-Fortran-90-95-compiler-for-F77-p26209677.html cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)

2010-01-24 Thread Chris Samuel
. Had great fun with that trying to track down why the mem property of Torque PBS jobs wasn't being enforced all the time. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see

Re: [OMPI devel] MPI_Graph_create

2009-12-01 Thread Chris Samuel
- "Jeff Squyres" wrote: > You are absolutely correct. I've filed CMRs > for v1.4 and v1.5. To clarify one point for people who weren't at the SC'09 Open-MPI BOF (hopefully I'll get this right!): 1.4 will be the bug-fix only continuation of the 1.3 feature series and

Re: [hwloc-devel] 0.9.3rc2 out

2009-12-01 Thread Chris Samuel
- "Jeff Squyres" wrote: > A lot of these are "unreferenced parameters" which > I think we should clean up someday, but not today. ;-) Fair comment. ;-) > The stdc99/stgnu99 one is worth looking at -- probably not for this > release, but it does seem like we should

Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-30 Thread Chris Samuel
- "Samuel Thibault" wrote: > What do you mean by "module support"? http://modules.sourceforge.net/ They make managing multiple software installations on clusters much much easier.. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian

Re: [hwloc-devel] 0.9.3rc2 out

2009-11-25 Thread Chris Samuel
- "Jeff Squyres" wrote: > Please beat it up! Compiles fine with PGI 10.0 and GCC 4.4.2, but we are getting warnings with Intel 11.1 for all files saying: icc: command line warning #10121: overriding '-stdc99' with '-stdgnu99'

Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-21 Thread Chris Samuel
Hi Michael, - "Michael Raymond" wrote: > Our architecture has blades with two Nehalems on > them, and the blades are connected together in a > CC-NUMA fashion. I've heard on the grapevine that there will be memory only blades too, which will have a Nehalem EX on them but

Re: [hwloc-devel] Pgcc issues fixed?

2009-11-04 Thread Chris Samuel
- "Jeff Squyres" wrote: > K. Clear for a final rc / release? Go for it, am just about to go run a training course now so won't be available until this arvo Melbourne time.. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian

Re: [hwloc-devel] Pgcc issues fixed?

2009-11-03 Thread Chris Samuel
- "Chris Samuel" <csam...@vpac.org> wrote: > Will try PGI 7.0 now. I can confirm it compiles OK with PGI 7.0, 7.1 and 7.2 with the same warnings as for 8.0. These warnings also appear with 9.0. Lots of warnings from the Intel v11 compilers, I've attached a text file

Re: [hwloc-devel] Pgcc issues fixed?

2009-11-03 Thread Chris Samuel
- "Chris Samuel" <csam...@vpac.org> wrote: > Grabbing now, thanks! Compiled OK with: [csamuel@tango hwloc-0.9.1rc3r1276]$ pgcc -V pgcc 8.0-6 64-bit target on x86-64 Linux -tp gh-64 Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved. Copyright 2000-2009,

Re: [hwloc-devel] Pgcc issues fixed?

2009-11-03 Thread Chris Samuel
- "Jeff Squyres" wrote: > Try this tarball: > > http://www.open-mpi.org/~jsquyres/unofficial/hwloc-0.9.1rc3r1276.tar.bz2 Grabbing now, thanks! Sorry for not seeing the email yesterday, it was a public holiday here yesterday (Melbourne Cup Day, yes we have a public

Re: [hwloc-devel] Pgcc issues fixed?

2009-11-03 Thread Chris Samuel
- "Jeff Squyres (jsquyres)" wrote: > Pgcc issues fixed? Sorry folks, have not yet got the SVN checkout to configure yet due to it requiring newer tools than I have and am buried trying to get board reports out at present. Hopefully will have some time tomorrow 2-3pm my

Re: [hwloc-devel] hwloc-0.9.1rc3 fails with pgcc

2009-10-30 Thread Chris Samuel
- "Pavan Balaji" wrote: > Log files attached. Hmm, it compiled for me with pgcc! [csamuel@tango hwloc-0.9.1rc3]$ pgcc -V pgcc 9.0-4 64-bit target on x86-64 Linux -tp shanghai-64 Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved. Copyright 2000-2009,

Re: [hwloc-devel] 0.9.1rc3 has been released

2009-10-30 Thread Chris Samuel
- "Jeff Squyres" wrote: > I tweaked this a bit -- how's this: Looks good to me! -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit

Re: [OMPI devel] RFC: Use automake "silent rules"

2009-10-26 Thread Chris Samuel
- "Jeff Squyres" wrote: > WHAT: Change OMPI's verbose build output to use > Automake's new "silent" rules output (see below) Not that I'm a developer, but as someone who regularly builds OMPI I'd be very happy with this change. cheers! Chris -- Christopher Samuel -

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Chris Samuel
- "Ashley Pittman" wrote: > $ grep Cpus_allowed_list /proc/$$/status Useful, ta! > Does this imply the default is to report on processes > in the current cpuset rather than the entire system? > Does anyone else feel that violates the principal of > least surprise?

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Chris Samuel
- "Tony Breeds" wrote: > Powerpc kernels that old do not have the topology information needed > (in /sys or /proc/cpuinfo) So for the short term that's be best we > can do. That's fine, I quite understand. I'm trying to get that cluster replaced anyway.. ;-) >

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Chris Samuel
- "Jeff Squyres" wrote: > Sweet! :-) > And -- your reply tells me that, for the 2nd time in a single day, I > posted to the wrong list. :-) Ah well, if you'd posted to the right list I wouldn't have seen this. > I'll forward your replies to the hwloc-devel list.

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-21 Thread Chris Samuel
- "Chris Samuel" <csam...@vpac.org> wrote: > Some sample results below for configs not represented > on the current website. A final example of a more convoluted configuration with a Torque job requesting 5 CPUs on a dual Shanghai node and has been given a non-co

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-21 Thread Chris Samuel
- "Jeff Squyres" wrote: > Give it a whirl: Nice - built without warnings with GCC 4.4.2. Some sample results below for configs not represented on the current website. Dual socket Shanghai: System(31GB) Node#0(15GB) + Socket#0 + L3(6144KB) L2(512KB) + L1(64KB)

Re: [OMPI devel] MPI_Graph_create

2009-10-15 Thread Chris Samuel
- "David Singleton" wrote: > Kiril Dichev has already pointed a problem with MPI_Cart_create > http://www.open-mpi.org/community/lists/devel/2009/08/6627.php > MPI_Graph_create has the same problem. I checked all other > functions with logical in arguments and

[OMPI devel] OMPI 1.3.4 ETA ? (TLAs FTW)

2009-09-27 Thread Chris Samuel
Hi folks, Just wondered if there was any idea of when OMPI 1.3.4 might be released ? I know the correct answer is "when it's ready" (:-)) but was curious if there was any thoughts on a timeframe ? The cpuset aware CPU affinity code would be very useful to us to fix up some codes that sometimes

Re: [OMPI devel] application hangs with multiple dup

2009-09-23 Thread Chris Samuel
Hi Terry, - "Terry Dontje" wrote: > It's actually is in the 1.3 branch now and has been > verified to solve the hanging issues of several members. Great, I'll get them to try a snapshot build! cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager

Re: [OMPI devel] application hangs with multiple dup

2009-09-23 Thread Chris Samuel
Hi Edgar, - "Edgar Gabriel" wrote: > it will be available in 1.3.4... That's great, thanks so much! cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053,

Re: [OMPI devel] application hangs with multiple dup

2009-09-22 Thread Chris Samuel
Hi Edgar, - "Edgar Gabriel" wrote: > just wanted to give a heads-up that I *think* I know what the problem > is. I should have a fix (with a description) either later today or > tomorrow morning... I see that changeset 21970 is on trunk to fix this issue, is that

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-08 Thread Chris Samuel
- "Ralph Castain" wrote: > Let me point out the obvious since this has plagued > us at LANL with regard to this concept. If a user > wants to do something different, all they have to > do is download and build their own copy of OMPI. One possibility may be to have OMPI

Re: [OMPI devel] suffix flag problems

2009-09-03 Thread Chris Samuel
- "David Robertson" wrote: > Hi all, Hiya, > We use both the PGI and Intel compilers over an > Infiniband cluster and I was trying to find a way > to have both orteruns in the path (in separate > directories) at the same time. Not a solution, but what we do

Re: [OMPI devel] RFC: convert send to ssend

2009-08-23 Thread Chris Samuel
- "Jeff Squyres" wrote: > Does anyone have any suggestions? Or are we stuck > with compile-time checking? I didn't see this until now, but I'd be happy with just a compile time option so we could produce an install just for debugging purposes and have our users

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-21 Thread Chris Samuel
- "Eugene Loh" wrote: > Actually, the current proposed defaults for 1.3.4 are > not to change the defaults at all. Thanks, I hadn't picked up on the latest update to the trac ticket 3 days ago that says that the defaults will stay the same. Sounds good to me! All the

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-21 Thread Chris Samuel
- "Chris Samuel" <csam...@vpac.org> wrote: > $ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca > odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4 To clarify - does that command line accurately reflect the proposed defaults for OMPI

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Chris Samuel" <csam...@vpac.org> wrote: > This is most likely because it's getting an error from the > kernel when trying to bind to a socket it's not permitted > to access. This is what strace reports: 18561 sched_setaffinity(18561, 8, { f0 } 18561 <.

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Eugene Loh" wrote: > Ah, you're missing the third secret safety switch that prevents > hapless mortals from using this stuff accidentally! :^) Sounds good to me. :-) > I think you need to add > > --mca opal_paffinity_alone 1 Yup, looks like that's it; it

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Ralph Castain" wrote: > Hi Chris Hiya, > The devel trunk has all of this in it - you can get that tarball from > the OMPI web site (take the nightly snapshot). OK, grabbed that (1.4a1r21825). Configured with: ./configure --prefix=$FOO --with-openib --with-tm=/usr/

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread Chris Samuel
- "Eugene Loh" wrote: Hi Eugene, [...] > It would be even better to have binding selections adapt to other > bindings on the system. Indeed! This touches on the earlier thread about making OMPI aware of its cpuset/cgroup allocation on the node (for those sites that

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread Chris Samuel
- "Jeff Squyres" wrote: > An important point to raise here: the 1.3 series is *not* the super > stable series. It is the *feature* series. Specifically: it is not > out of scope to introduce or change features within the 1.3 series. Ah, I think I've misunderstood

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Eugene Loh" wrote: > This is an important discussion. Indeed! My big fear is that people won't pick up the significance of the change and will complain about performance regressions in the middle of an OMPI stable release cycle. > Do note: > > 1) Bind-to-core is

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Hi Chris Hiya Ralph, > There would be a "-do-not-bind" option that will prevent us from > binding processes to anything which should cover that situation. Gotcha. > My point was only that we would be changing the out-of-the-box > behavior to

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Terry Dontje" wrote: > I just wanted to give everyone a heads up if they do not get bugs > email. I just submitted a CMR to move over some new paffinity options > from the trunk to the v1.3 branch. Ralphs comments imply that for those sites that share nodes

Re: [OMPI devel] Shared library versioning

2009-07-28 Thread Chris Samuel
- "Ralf Wildenhues" wrote: > You can probably solve most of these issues by just > versioning the directory names where you put the files; To be honest I'm not sure if this is something that OMPI should be looking to solve, we have lots of different versions

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Chris Samuel
- "Ralph Castain" wrote: > Perhaps a telecon (myself, Jeff S, and you) would be best at this > stage. Sounds good, will take that part to private email. > I confess I'm now confused too - what you describe is precisely > what we already do. I added printf()'s to the

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Ralph, - "Ralph Castain" wrote: > UmmmI'll let you guys work this out on PLPA. However, just to > clarify, OMPI currently binds to cores, not logical cpus. It is the > PLPA that is "dumb" and provides the plumbing to do what OMPI tells > it. > > :-) Ahh, if

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Bert, - "Bert Wesarg" wrote: > The Cpus_allowed* fields in /proc//status are the same as > sched_getaffinity returns and the /proc//cpuset needs to be > resolved, i.e. where is the cpuset fs mounted? The convention is to mount it on /dev/cpuset.

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
- "Jeff Squyres" wrote: Hi Jeff, > I'm the "primary PLPA" guy that Ralph referred to, and I was on > vacation last week -- sorry for missing all the chatter. No worries! > Based on your mails, it looks like you're out this week -- so little > will likely occur.

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > Should just be > > -mca paffinity_base_verbose 5 > > Any value greater than 4 should turn it "on" Yup, that's what I was trying, but couldn't get any output. > Something I should have mentioned. The paffinity_base_service.c file > is

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > This will tell you what module is loaded. If PLPA -can- run, you > should see the linux module selected. Thanks Ralph, yes it is being selected. I'll carry on digging. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? If not, we won't set affinity on the > processes. Just realised that in the failed test I posted I set -mca mpi_affinity_alone 1 *instead* of -mca paffinity

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Sounds like a problem in PLPA - I'll have to defer > to them. Understood, thanks for that update. I'll try and find some time to look inside PLPA too. > Our primary PLPA person is on vacation this week, so > you might not hear back from him

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? Ahh, no, sorry, still feeling my way with this.. > If not, we won't set affinity on the processes. Now it fails immediately with: Setting processor

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: > Could you check this? You can run a trivial job using the -npernode x > option, where x matched the #cores you were allocated on the nodes. > If you do this, do we bind to the correct cores? Nope, I'm afraid it doesn't - submitted a job asking

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: Hi Ralph, > Interesting. No, we don't take PLPA cpu sets into account when > retrieving the allocation. Understood. > Just to be clear: from an OMPI perspective, I don't think this is an > issue of binding, but rather an issue of allocation.

[OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
Hi all, Not sure if this is a OpenMPI query or a PLPA query, but given that PLPA seems to have some support for it already I thought I'd start here. :-) We run a quad core Opteron cluster with Torque 2.3.x which uses the kernels cpuset support to constrain a job to just the cores it has been