Re: [OMPI devel] [TIPC BTL] test programmes

2011-08-01 Thread Chris Samuel
On Mon, 1 Aug 2011 09:47:00 PM Xin He wrote: > Do any of you guys have any testing programs that I should > run to test if it really works? How about a real MPI program which has test data to check it's running OK ? Gromacs is open source and has a self-test mechanism run via "make test" IIRC.

Re: [OMPI devel] [torquedev] Communication between Torque and MPI

2011-08-25 Thread Chris Samuel
On Thu, 25 Aug 2011 09:07:48 PM Jayavant Patil wrote: > Hi, Hiya, > Is anybody having a tutorial or reference pages > explaining about the communication between Torque > and MPI? Open-MPI uses the PBS Task Manager (TM) API to talk to Torque pbs_mom's. If you have the Torque manual pages instal

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Chris Samuel
On Thursday 09 February 2012 22:18:20 Jeff Squyres wrote: > Just so that I understand this better -- if a process is bound in a > cpuset, will tools like hwloc's lstopo only show the Linux > processors *in that cpuset*? I.e., does it not have any > visibility of the processors outside of its cpus

[OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
Hi all, We've been working trying to track down an IB issue here where a user was having code (Gromacs, run with OMPI 1.4.5) dieing with: [[18115,1],2][btl_openib_component.c:3224:handle_wc] from bruce030 to: bruce130 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_

Re: [OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
On Tuesday 13 March 2012 10:06:43 Chris Samuel wrote: > Those don't match the values compiled into OMPI 1.4.5: > > ompi_info -a | egrep > 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout' MCA btl: > parameter "btl_openib_ib_min_rnr_timer" (current v

[OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-08-28 Thread Chris Samuel
Hi folks, One of our users (oh, OK, our director, one of the Dalton developers) has found an odd behaviour of OMPI 1.6.5 on our x86 clusters and has managed to get a small reproducer - a modified version of the ubiquitous F90 "hello world" MPI program. We find that if we run this program (compile

Re: [OMPI devel] SC13 birds of a feather

2013-12-04 Thread Chris Samuel
On Wed, 4 Dec 2013 11:39:29 AM Jeff Squyres wrote: > On Dec 3, 2013, at 7:54 PM, > Christopher Samuel wrote: > > > Would it make any sense to expose system/environmental/thermal > > information to the application via MPI_T ? > > Hmm. Interesting idea. Phew. :-) > Is the best way to grab such

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Chris Samuel
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote: > We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually > push upstream. However, to use it, it will require linking in a proprietary > Mellanox library that accelerates the collective operations (available in > MOFED versions

[OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
Hi all, Not sure if this is a OpenMPI query or a PLPA query, but given that PLPA seems to have some support for it already I thought I'd start here. :-) We run a quad core Opteron cluster with Torque 2.3.x which uses the kernels cpuset support to constrain a job to just the cores it has been allo

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: Hi Ralph, > Interesting. No, we don't take PLPA cpu sets into account when > retrieving the allocation. Understood. > Just to be clear: from an OMPI perspective, I don't think this is an > issue of binding, but rather an issue of allocation. If we knew we had >

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: > Could you check this? You can run a trivial job using the -npernode x > option, where x matched the #cores you were allocated on the nodes. > If you do this, do we bind to the correct cores? Nope, I'm afraid it doesn't - submitted a job asking for 4 cores on one

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? Ahh, no, sorry, still feeling my way with this.. > If not, we won't set affinity on the processes. Now it fails immediately with: Setting processor affinity failed --> Ret

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Sounds like a problem in PLPA - I'll have to defer > to them. Understood, thanks for that update. I'll try and find some time to look inside PLPA too. > Our primary PLPA person is on vacation this week, so > you might not hear back from him until later next wee

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-18 Thread Chris Samuel
- "Ralph Castain" wrote: > Looking at your command line, did you remember to set -mca > mpi_paffinity_alone 1? If not, we won't set affinity on the > processes. Just realised that in the failed test I posted I set -mca mpi_affinity_alone 1 *instead* of -mca paffinity linux, rather than as

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > This will tell you what module is loaded. If PLPA -can- run, you > should see the linux module selected. Thanks Ralph, yes it is being selected. I'll carry on digging. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partn

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Chris Samuel" wrote: > I'll carry on digging. I've been trying to track back from the linux paffinity module to find some useful debugging info I can get my teeth into and I can see that the file: opal/mca/paffinity/base/paffinity_base_service.c seems

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-19 Thread Chris Samuel
- "Ralph Castain" wrote: > Should just be > > -mca paffinity_base_verbose 5 > > Any value greater than 4 should turn it "on" Yup, that's what I was trying, but couldn't get any output. > Something I should have mentioned. The paffinity_base_service.c file > is solely used by the rank_f

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
- "Jeff Squyres" wrote: Hi Jeff, > I'm the "primary PLPA" guy that Ralph referred to, and I was on > vacation last week -- sorry for missing all the chatter. No worries! > Based on your mails, it looks like you're out this week -- so little > will likely occur. I'm at the MPI Forum st

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Bert, - "Bert Wesarg" wrote: > The Cpus_allowed* fields in /proc//status are the same as > sched_getaffinity returns and the /proc//cpuset needs to be > resolved, i.e. where is the cpuset fs mounted? The convention is to mount it on /dev/cpuset. Unfortunately you cannot mount both the c

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
- "Jeff Squyres" wrote: > PLPA does not currently deal with cpusets. I think it can get close enough if it assumes that its initial affinity list is the subset of cores that it can choose from when setting CPU affinity. As for whether OMPI or PLPA should choose, I suspect it's better if OM

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-24 Thread Chris Samuel
Hi Ralph, - "Ralph Castain" wrote: > UmmmI'll let you guys work this out on PLPA. However, just to > clarify, OMPI currently binds to cores, not logical cpus. It is the > PLPA that is "dumb" and provides the plumbing to do what OMPI tells > it. > > :-) Ahh, if that's the case then

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Chris Samuel
- "Ralph Castain" wrote: > Perhaps a telecon (myself, Jeff S, and you) would be best at this > stage. Sounds good, will take that part to private email. > I confess I'm now confused too - what you describe is precisely > what we already do. I added printf()'s to the PLPA init(), PLPA_NA

Re: [OMPI devel] Shared library versioning

2009-07-28 Thread Chris Samuel
- "Ralf Wildenhues" wrote: > You can probably solve most of these issues by just > versioning the directory names where you put the files; To be honest I'm not sure if this is something that OMPI should be looking to solve, we have lots of different versions installed on our clusters just u

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Terry Dontje" wrote: > I just wanted to give everyone a heads up if they do not get bugs > email. I just submitted a CMR to move over some new paffinity options > from the trunk to the v1.3 branch. Ralphs comments imply that for those sites that share nodes between jobs (such as oursel

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Ralph Castain" wrote: > Hi Chris Hiya Ralph, > There would be a "-do-not-bind" option that will prevent us from > binding processes to anything which should cover that situation. Gotcha. > My point was only that we would be changing the out-of-the-box > behavior to the opposite of tod

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-16 Thread Chris Samuel
- "Eugene Loh" wrote: > This is an important discussion. Indeed! My big fear is that people won't pick up the significance of the change and will complain about performance regressions in the middle of an OMPI stable release cycle. > Do note: > > 1) Bind-to-core is actually the default be

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread Chris Samuel
- "Jeff Squyres" wrote: > An important point to raise here: the 1.3 series is *not* the super > stable series. It is the *feature* series. Specifically: it is not > out of scope to introduce or change features within the 1.3 series. Ah, I think I've misunderstood the website then. :-(

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread Chris Samuel
- "Eugene Loh" wrote: Hi Eugene, [...] > It would be even better to have binding selections adapt to other > bindings on the system. Indeed! This touches on the earlier thread about making OMPI aware of its cpuset/cgroup allocation on the node (for those sites that are using it), it might

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Ralph Castain" wrote: > Hi Chris Hiya, > The devel trunk has all of this in it - you can get that tarball from > the OMPI web site (take the nightly snapshot). OK, grabbed that (1.4a1r21825). Configured with: ./configure --prefix=$FOO --with-openib --with-tm=/usr/ local/torque/latest

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Eugene Loh" wrote: > Ah, you're missing the third secret safety switch that prevents > hapless mortals from using this stuff accidentally! :^) Sounds good to me. :-) > I think you need to add > > --mca opal_paffinity_alone 1 Yup, looks like that's it; it fails to launch with tha

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel
- "Chris Samuel" wrote: > This is most likely because it's getting an error from the > kernel when trying to bind to a socket it's not permitted > to access. This is what strace reports: 18561 sched_setaffinity(18561, 8, { f0 } 18561 <... sched_se

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-21 Thread Chris Samuel
- "Chris Samuel" wrote: > $ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca > odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4 To clarify - does that command line accurately reflect the proposed defaults for OMPI 1.3.4 ? cheers, Chris

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-21 Thread Chris Samuel
- "Eugene Loh" wrote: > Actually, the current proposed defaults for 1.3.4 are > not to change the defaults at all. Thanks, I hadn't picked up on the latest update to the trac ticket 3 days ago that says that the defaults will stay the same. Sounds good to me! All the best and have a good w

Re: [OMPI devel] RFC: convert send to ssend

2009-08-23 Thread Chris Samuel
- "Jeff Squyres" wrote: > Does anyone have any suggestions? Or are we stuck > with compile-time checking? I didn't see this until now, but I'd be happy with just a compile time option so we could produce an install just for debugging purposes and have our users explicitly select it with mo

Re: [OMPI devel] RFC: convert send to ssend

2009-08-24 Thread Chris Samuel
- "George Bosilca" wrote: > Do people know that there exist tools for checking MPI code > correctness? Many, many tools and most of them are freely > available. Yes, but have yet to be able to persuade any of our users to use them (and have no control over them). :-( -- Christopher Samu

Re: [OMPI devel] XML request

2009-08-27 Thread Chris Samuel
- "Ralph Castain" wrote: > Hi Greg > > I fixed these so they will get properly formatted. However, it is > symptomatic of a much broader problem - namely, that developers have > inserted print statements throughout the code for reporting errors. > There simply isn't any easy way for me to c

Re: [OMPI devel] suffix flag problems

2009-09-03 Thread Chris Samuel
- "David Robertson" wrote: > Hi all, Hiya, > We use both the PGI and Intel compilers over an > Infiniband cluster and I was trying to find a way > to have both orteruns in the path (in separate > directories) at the same time. Not a solution, but what we do here is to arrange our installs

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-07 Thread Chris Samuel
- "Ralph Castain" wrote: > Let me point out the obvious since this has plagued > us at LANL with regard to this concept. If a user > wants to do something different, all they have to > do is download and build their own copy of OMPI. One possibility may be to have OMPI honour a config file

Re: [OMPI devel] application hangs with multiple dup

2009-09-22 Thread Chris Samuel
Hi Edgar, - "Edgar Gabriel" wrote: > just wanted to give a heads-up that I *think* I know what the problem > is. I should have a fix (with a description) either later today or > tomorrow morning... I see that changeset 21970 is on trunk to fix this issue, is that backportable to the 1.3.x

Re: [OMPI devel] application hangs with multiple dup

2009-09-23 Thread Chris Samuel
Hi Edgar, - "Edgar Gabriel" wrote: > it will be available in 1.3.4... That's great, thanks so much! cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-

Re: [OMPI devel] application hangs with multiple dup

2009-09-23 Thread Chris Samuel
Hi Terry, - "Terry Dontje" wrote: > It's actually is in the 1.3 branch now and has been > verified to solve the hanging issues of several members. Great, I'll get them to try a snapshot build! cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnershi

[OMPI devel] OMPI 1.3.4 ETA ? (TLAs FTW)

2009-09-27 Thread Chris Samuel
Hi folks, Just wondered if there was any idea of when OMPI 1.3.4 might be released ? I know the correct answer is "when it's ready" (:-)) but was curious if there was any thoughts on a timeframe ? The cpuset aware CPU affinity code would be very useful to us to fix up some codes that sometimes g

Re: [OMPI devel] MPI_Graph_create

2009-10-15 Thread Chris Samuel
- "David Singleton" wrote: > Kiril Dichev has already pointed a problem with MPI_Cart_create > http://www.open-mpi.org/community/lists/devel/2009/08/6627.php > MPI_Graph_create has the same problem. I checked all other > functions with logical in arguments and no others do anything > simila

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-21 Thread Chris Samuel
- "Jeff Squyres" wrote: > Give it a whirl: Nice - built without warnings with GCC 4.4.2. Some sample results below for configs not represented on the current website. Dual socket Shanghai: System(31GB) Node#0(15GB) + Socket#0 + L3(6144KB) L2(512KB) + L1(64KB) + Core#0 + P#0 L2

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-21 Thread Chris Samuel
- "Chris Samuel" wrote: > Some sample results below for configs not represented > on the current website. A final example of a more convoluted configuration with a Torque job requesting 5 CPUs on a dual Shanghai node and has been given a non-contiguous configuration. [

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-21 Thread Chris Samuel
- "Jeff Squyres" wrote: > Sweet! :-) > And -- your reply tells me that, for the 2nd time in a single day, I > posted to the wrong list. :-) Ah well, if you'd posted to the right list I wouldn't have seen this. > I'll forward your replies to the hwloc-devel list. Not a problem - I'll g

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Chris Samuel
- "Tony Breeds" wrote: > Powerpc kernels that old do not have the topology information needed > (in /sys or /proc/cpuinfo) So for the short term that's be best we > can do. That's fine, I quite understand. I'm trying to get that cluster replaced anyway.. ;-) > FWIW I'm looking at how we

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Chris Samuel
- "Ashley Pittman" wrote: > $ grep Cpus_allowed_list /proc/$$/status Useful, ta! > Does this imply the default is to report on processes > in the current cpuset rather than the entire system? > Does anyone else feel that violates the principal of > least surprise? Not really, I feel that

Re: [OMPI devel] RFC: Use automake "silent rules"

2009-10-26 Thread Chris Samuel
- "Jeff Squyres" wrote: > WHAT: Change OMPI's verbose build output to use > Automake's new "silent" rules output (see below) Not that I'm a developer, but as someone who regularly builds OMPI I'd be very happy with this change. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - System

Re: [OMPI devel] MPI_Graph_create

2009-12-01 Thread Chris Samuel
- "Jeff Squyres" wrote: > You are absolutely correct. I've filed CMRs > for v1.4 and v1.5. To clarify one point for people who weren't at the SC'09 Open-MPI BOF (hopefully I'll get this right!): 1.4 will be the bug-fix only continuation of the 1.3 feature series and will be binary compati

Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)

2010-01-24 Thread Chris Samuel
map(). Had great fun with that trying to track down why the mem property of Torque PBS jobs wasn't being enforced all the time. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For mor

Re: [OMPI devel] RFC: Adding -DOPEN_MPI=1 to mpif77 and mpif90

2010-02-11 Thread Chris Samuel
r lists as something they should look at doing better: http://old.nabble.com/Re:-Can't-use-Fortran-90-95-compiler-for-F77-p26209677.html cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-11 Thread Chris Samuel
erested (though they will likely ask for a bisection too). cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-11 Thread Chris Samuel
9616384 93.8 1.1 16 81.1 91.5 # end result "Pingpong_Send_Recv" # duration = 0.02 sec cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

[OMPI devel] OMPI 1.5 twitter notification plugin probably broken by switch to OAUTH

2010-09-01 Thread Chris Samuel
Hi folks, Looking at the code for the Twitter notifier in OMPI 1.5 and seeing its use of HTTP basic authentication I would suggest that it may be non-functional due to Twitters switch to purely OAUTH based authentication for their API. I'm trying to test it out here but I'm at a bit of a loss to

[OMPI devel] Enable issue tracker for ompi-www repo?

2017-11-04 Thread Chris Samuel
Hi folks, I was looking to file an issue against the website for the FAQ about XRC support (given it was disabled in issue #4087) but it doesn't appear to be enabled. Is that just an oversight or is there a different way preferred? All the best, Chris -- Christopher SamuelSenior Sys

Re: [OMPI devel] Enable issue tracker for ompi-www repo?

2017-11-04 Thread Chris Samuel
On Sunday, 5 November 2017 12:07:52 AM AEDT r...@open-mpi.org wrote: > It was just an oversight - I have turned on the issue tracker, so feel free > to post, or a PR is also welcome Thanks Ralph! Unfortunately I don't feel I understand enough about background of the issue to say much more abou

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Chris Samuel
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote: > Ah -- the point being that this is not an issue related to the libltdl work. Sorry - I saw the request to test the tarball and tried it out, missed the significance of the subject. :-/ -- Christopher SamuelSenior Systems Administrat

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote: > Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds): > > $ sysctl net.ipv4.tcp_keepalive_time > net.ipv4.tcp_keepalive_time = 1800 I suspect that's a local customisation, all Linux systems I've got access to (including RHEL 6.4/6.5/

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote: > I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man" > didn't find any matches). This seems to document it for an unspecified version of Solaris: http://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdg/index.html

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Chris Samuel
Hi Gilles, On Mon, 13 Jul 2015 03:16:57 PM Gilles Gouaillardet wrote: > i made ConnectX XRC (aka XRC) and ConnectIb XRC (aka XRC domains) exclusive, > so yes, you got the desired behavior. Is there a tarball I could test on our x86 systems please? We are tied to the OFED in RHEL due to having

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Chris Samuel
On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote: > Hi Chris, Hi Gilles, > i pushed my tarball into a gist : Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa 2010, one circa 2013) with their included OFED I see: checking if ConnectX XRC support is enabled...