from:"Brice Goglin"

Re: [OMPI devel] 32 bit issues (was: 32 bit support needs a maintainer)

2023-02-12 Thread Brice Goglin via devel

Hello This seems to be the code for sharing the hwloc topology in shared memory. This whole thing was designed for very large virtual spaces (>=48bits on most CPUs) where it's easy to find a virtual memory area that is unused in all participating processes. I am not sure it's worth fixing on

Re: [OMPI devel] v5.0 equivalent of --map-by numa

2021-11-11 Thread Brice Goglin via devel

Hello Ben It will be back, at least for the majority of platforms (those without heterogeneous memory). See https://github.com/open-mpi/ompi/issues/8170 and https://github.com/openpmix/prrte/pull/1141 Brice Le 11/11/2021 à 05:33, Ben Menadue via devel a écrit : Hi, Quick question: what

Re: [OMPI devel] HWLOC duplication relief

2021-02-04 Thread Brice Goglin via devel

begs the question: how does a library detect that the shmem >>> region has already been mapped? If we attempt to map it and fail, does that >>> mean it has already been mapped or that it doesn't exist? >>> >>> It isn't reasonable to expect that all t

Re: [OMPI devel] HWLOC duplication relief

2021-02-03 Thread Brice Goglin via devel

Hello Ralph One thing that isn't clear in this document : the hwloc shmem region may only be mapped *once* per process (because the mmap address is always the same). Hence, if a library calls adopt() in the process, others will fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC top

Re: [OMPI devel] Git submodules are coming

2020-11-13 Thread Brice Goglin via devel

FYI, this was a git bug that will be fixed soon (the range of commits being rebased was wrong). https://lore.kernel.org/git/pull.789.git.1605314085.gitgitgad...@gmail.com/T/#t https://lore.kernel.org/git/20d6104d-ca02-4ce4-a1c0-2f9386ded...@gmail.com/T/#t Brice Le 07/02/2020 à 10:27, Brice

Re: [OMPI devel] Git submodules are coming

2020-02-07 Thread Brice Goglin via devel

Hello I have a git submodule issue that I don't understand. PR#7367 was initially on top of PR #7366. When Jeff merged PR#7366, I rebased my #7367 with git prrs and got this error: $ git prrs origin master >From https://github.com/open-mpi/ompi * branch master -> FETCH_HEAD

Re: [OMPI devel] Git submodules are coming

2020-01-07 Thread Brice Goglin via devel

Thanks a lot for writing all this. At the end https://github.com/open-mpi/ompi/wiki/GitSubmodules#adding-a-new-submodule-pointing-to-a-specific-commit should "bar" be "bar50x" in line "$ git add bar" ? It seems to me that you are in opal/mca/foo and the new submodule is in "bar50x" (according to

Re: [OMPI devel] Gentle reminder: sign up for the face to face

2019-02-26 Thread Brice Goglin

Hello Jeff Looks like I am not allowed to modify the page but I'll be at the meeting ;) Brice Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit : > Gentle reminder to please sign up for the face-to-face meeting and add your > items to the wiki: > > https://github.com/open-m

Re: [OMPI devel] About supporting HWLOC 2.0.x

2018-05-24 Thread Brice Goglin

I just pushed my patches rebased on master + update to hwloc 2.0.1 to bgoglin/ompi (master branch). My testing of mapping/ranking/binding looks good here (on dual xeon with CoD, 2 sockets x 2 NUMA x 6 cores). It'd be nice if somebody else could test on another platform with different options and/

Re: [OMPI devel] About supporting HWLOC 2.0.x

2018-05-22 Thread Brice Goglin

Sorry guys, I think I have all patches ready since the F2F meeting, but I couldn't test them enough because ranking was broken. I'll work on that by next week. Brie Le 22/05/2018 à 17:50, r...@open-mpi.org a écrit : > I’ve been running with hwloc 2.0.1 for quite some time now without problem,

[OMPI devel] hwloc issues in this week telcon?

2018-01-31 Thread Brice Goglin

Hello Two hwloc issues are listed in this week telcon: "hwloc2 WIP, may need help with." https://github.com/open-mpi/ompi/pull/4677 * Is this really a 3.0.1 thing? I thought hwloc2 was only for 3.1+ * As I replied in this PR, I have some patches but I need help for testing them. Can you list some

Re: [OMPI devel] hwloc2 and cuda and non-default cudatoolkit install location

2017-12-20 Thread Brice Goglin

Le 20/12/2017 à 22:01, Howard Pritchard a écrit : > > I can think of several ways to fix it. Easiest would be to modify the > > opal/mca/hwloc/hwloc2a/configure.m4 > > to not set --enable-cuda if --with-cuda is evaluated to something > other than yes. > > > Optionally, I could fix the hwloc config

Re: [OMPI devel] HWLOC / rmaps ppr build failure

2017-10-04 Thread Brice Goglin

Looks like you're using a hwloc < 1.11. If you want to support this old API while using the 1.11 names, you can add this to OMPI after #include #if HWLOC_API_VERSION < 0x00010b00 #define HWLOC_OBJ_NUMANODE HWLOC_OBJ_NODE #define HWLOC_OBJ_PACKAGE HWLOC_OBJ_SOCKET #endif Brice Le 04/10/2017 19

Re: [OMPI devel] KNL/hwloc funny message question

2017-09-01 Thread Brice Goglin

Hello This message is related to /var/cache/hwloc/knl_memoryside_cache. This file exposes the KNL cluster and MCDRAM configuration, only accessible from root-only files. hwloc-dump-hwdata runs at boot time to create that file, and non-root hwloc users can read it later. Failing to read that file b

Re: [OMPI devel] Coverity strangeness

2017-06-15 Thread Brice Goglin

You can email scan-ad...@coverity.com to report bugs and/or ask what's going on. Brice Le 16/06/2017 07:12, Gilles Gouaillardet a écrit : > Ralph, > > > my 0.02 US$ > > > i noted the error message mentions 'holding lock > "pmix_mutex_t.m_lock_pthread"', but it does not explicitly mentions > > '

[OMPI devel] anybody ported OMPI to hwloc 2.0 API?

2017-04-05 Thread Brice Goglin

Hello Did anybody start porting OMPI to the new hwloc 2.0 API (currently in hwloc git master)? Gilles, I seem to remember you were interested a while ago? I will have to do it in the near future. If anybody already started that work, please let me know. Brice ___

Re: [OMPI devel] hwloc missing NUMANode object

2017-01-05 Thread Brice Goglin

Le 05/01/2017 07:07, Gilles Gouaillardet a écrit : > Brice, > > things would be much easier if there were an HWLOC_OBJ_NODE object in > the topology. > > could you please consider backporting the relevant changes from master > into the v1.11 branch ? > > Cheers, > > Gilles Hello Unfortunately, I

Re: [OMPI devel] C89 support

2016-08-29 Thread Brice Goglin

s/June 2016/June 2006/ :) Anyway, it ended on July 31st based on https://www.suse.com/lifecycle/ Brice Le 29/08/2016 16:03, Gilles Gouaillardet a écrit : > According to wikipedia, SLES 10 was released on June 2016, and is > supported for 10 years. > (SLES 12 is supported for 13 years, and I ho

Re: [OMPI devel] Migration of mailman mailing lists

2016-07-18 Thread Brice Goglin

Yes, kill all netloc lists. Brice Le 18 juillet 2016 17:43:49 UTC+02:00, Josh Hursey a écrit : >Now that netloc has rolled into hwloc, I think it is safe to kill the >netloc lists. > >mtt-devel-core and mtt-annouce should be kept. They probably need to be >cleaned. But the hope is that we relea

Re: [OMPI devel] [PATCH] Fix for xlc-13.1.0 ICE (hwloc)

2016-05-08 Thread Brice Goglin

Thanks, applied to hwloc. And PR for OMPI master at https://github.com/open-mpi/ompi/pull/1657 Brice Le 06/05/2016 00:29, Paul Hargrove a écrit : > I have some good news: I have a fix!! > > FWIW: I too can build w/ xlc 12.1 (also BG/Q). > It is just the 13.1.0 on Power7 that crashes building hw

Re: [OMPI devel] [PATCH] Fix for xlc-13.1.0 ICE (hwloc)

2016-05-06 Thread Brice Goglin

Thanks I think I would be fine with that fix. Unfortunately I won't have a good internet access until sunday night. I won't be able to test anything properly earlier :/ Le 06/05/2016 00:29, Paul Hargrove a écrit : > I have some good news: I have a fix!! > > FWIW: I too can build w/ xlc 12.1 (al

Re: [OMPI devel] [2.0.0rc2] build failure with ppc64 and "gcc -m32" (hwloc)

2016-05-03 Thread Brice Goglin

https://github.com/open-mpi/ompi/pull/1621 (against master, needs to go to 2.0 later) Le 03/05/2016 08:22, Brice Goglin a écrit : > Yes we should backport this to OMPI master and v2.x. > I am usually not the one doing the PR, I'd need to learn the exact > procedure first :) > &

Re: [OMPI devel] [2.0.0rc2] build failure with ppc64 and "gcc -m32" (hwloc)

2016-05-03 Thread Brice Goglin

> 1.11.2, and into v2.x in particular? > Or perhaps that is Jeff's job? > > -Paul > > On Mon, May 2, 2016 at 11:04 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote: > > Should be fixed by > > https://github.com/open-mpi/hwloc/commit/9549f

Re: [OMPI devel] [2.0.0rc2] build failure with ppc64 and "gcc -m32" (hwloc)

2016-05-03 Thread Brice Goglin

Should be fixed by https://github.com/open-mpi/hwloc/commit/9549fd59af04dca2e2340e17f0e685f8c552d818 Thanks for the report Brice Le 02/05/2016 21:53, Paul Hargrove a écrit : > I have a linux/ppc64 host running Fedora 20. > I have configured the 2.0.0rc2 tarball with > > --prefix=[] --ena

Re: [OMPI devel] Why is floating point number used for locality

2016-04-28 Thread Brice Goglin

It comes from the hwloc API. It doesn't use integers because some users want to provide their own distance matrix that was generated by benchmarks. Also we normalize the matrix to have latency 1 on the diagonal (for local memory access latency ) and that causes non-diagonal items not to be integers

Re: [OMPI devel] [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Brice Goglin

Hello hwloc doesn't have any cuda specific configure variables. We just use standard variables like LIBS and CPPFLAGS. I guess OMPI could propagate --with-cuda directories to hwloc by setting LIBS and CPPFLAGS before running hwloc m4 functions, but I don't think OMPI actually cares about hwloc repo

Re: [OMPI devel] PMIX vs Solaris

2015-09-28 Thread Brice Goglin

Sorry, I didn't see this report before the pull request. I applied Gilles' "simple but arguable" fix to master and stable branches up to v1.9. It could be too imperfect if somebody ever changes to permissions of /devices/pci* but I guess that's not going to happen in practice. Finding the right de

Re: [OMPI devel] HWLOC issue

2015-09-10 Thread Brice Goglin

ther nodes of the same cluster ... > > George. > > > On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote: > > Did it work on the same machine before? Or did OMPI enable hwloc's > PCI discovery recently? > > Does lstop

Re: [OMPI devel] HWLOC issue

2015-09-10 Thread Brice Goglin

; lstopo complains with the same assert. Interestingly enough, the same > binary succeed on the other nodes of the same cluster ... > > George. > > > On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote: > > Did it work on the sa

Re: [OMPI devel] HWLOC issue

2015-09-10 Thread Brice Goglin

Did it work on the same machine before? Or did OMPI enable hwloc's PCI discovery recently? Does lstopo complain the same? Brice Le 10/09/2015 21:10, George Bosilca a écrit : > With the current trunk version I keep getting an assert deep down in > orted. > > orted: > ../../../../../../../ompi/op

Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Brice Goglin

Le 04/09/2015 00:36, Gilles Gouaillardet a écrit : > Ralph, > > just to be clear, your proposal is to abort if openmpi is configured > with --without-hwloc, right ? > ( the --with-hwloc option is not removed because we want to keep the > option of using an external hwloc library ) > > if I understa

Re: [OMPI devel] Dual rail IB card problem

2015-09-01 Thread Brice Goglin

Le 01/09/2015 15:59, marcin.krotkiewski a écrit : > Dear Rolf and Brice, > > Thank you very much for your help. I have now moved the 'dubious' IB > card from Slot 1 to Slot 5. It is now reported by hwloc as bound to a > separate NUMA node. In this case OpenMPI works as could be expected: > > - NUM

Re: [OMPI devel] Dual rail IB card problem

2015-09-01 Thread Brice Goglin

as a > floating point number ? > > i remember i had to fix a bug in ompi a while ago > /* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */ > > Cheers, > > Gilles > > On 9/1/2015 5:28 AM, Brice Goglin wrote: >> The locality is mlx4_0 as reported by l

Re: [OMPI devel] Dual rail IB card problem

2015-08-31 Thread Brice Goglin

The locality is mlx4_0 as reported by lstopo is "near the entire machine" (while mlx4_1 is reported near NUMA node #3). I would vote for buggy PCI-NUMA affinity being reported by the BIOS. But I am not very familiar with 4x E5-4600 machines so please make sure this PCI slot is really attached to a

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-25 Thread Brice Goglin

Le 25/08/2015 05:59, Christopher Samuel a écrit : > > INRIA does have Open-MX (Myrinet Express over Generic Ethernet > Hardware), last release December 2014. No idea if it's still developed > or used.. > > http://open-mx.gforge.inria.fr/ > > Brice? > > Open-MPI is listed as working with it there.

Re: [OMPI devel] 1.8.8rc1 testing report

2015-07-31 Thread Brice Goglin

It was renamed from cpuid.h to cpuid-x86.h at some point. Can't check from here but the actual code should be the same in all these branches. Brice Le 31 juillet 2015 22:19:47 UTC+02:00, Ralph Castain a écrit : >Yo Paul > >1.8.8 and 1.10 do not have hwloc-1.11 in them - they remain on >hwloc-1

Re: [OMPI devel] [OMPI users] Configuration error with external hwloc

2015-03-24 Thread Brice Goglin

Le 24/03/2015 20:47, Jeff Squyres (jsquyres) a écrit : > I talked to Peter off-list. > > We got a successful build going for him. > > Seems like we've identified a few issues here, though: > > 1. ./configure with gcc 4.7.2 on Debian (I didn't catch the precise version > of Debian) results in a Lhw

Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Brice Goglin

Le 17/12/2014 21:43, Paul Hargrove a écrit : > > Dbx gives me > > t@1 (l@1) terminated by signal SEGV (no mapping at the fault address) > Current function is opal_hwloc172_hwloc_get_obj_by_depth >74 return topology->levels[depth][idx]; > (dbx) where > current thread: t@1

Re: [OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Brice Goglin

Le 12/12/2014 07:36, Gilles Gouaillardet a écrit : > Brice, > > ompi master is based on hwloc 1.9.1, isn't it ? Yes sorry, I am often confused by all these OMPI vs hwloc branch numbers. > > if some backport is required for hwloc 1.7.2 (used by ompi v1.8), then > could you please update the hwloc

Re: [OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Brice Goglin

ld this fix be backported to both master and v1.8 ? > > Cheers, > > Gilles > > On 2014/12/12 7:46, Brice Goglin wrote: >> This problem was fixed in hwloc upstream recently. >> >> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e >

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin

ldn’t leaf thru your output well enough to see all the lstopo >>>>>>> versions, but you might check to ensure they are the same. >>>>>>> >>>>>>> Looking at the code base, you may also hit a problem here. OMPI 1.6 >>>>>&g

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin

a big change. OMPI 1.8 series is based on >>>>>> hwloc 1.9, so at least that is closer (though probably still a mismatch). >>>>>> >>>>>> Frankly, I’d just download and install an OMPI tarball myself and avoid >>>>>> these heada

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-08 Thread Brice Goglin

>> about 9/10 cases but works without warnings in 1/10 cases. I attached the >>> output (with xml) for both the working and `broken` case. Note that the xml >>> is of course printed (differently) multiple times for each task/core. As >>> always, any help would be

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-07 Thread Brice Goglin

Hello The github issue you're refering to was closed 18 months ago. The warning (it's not an error) is only supposed to appear if you're importing in a recent hwloc a XML that was exported from a old hwloc. I don't see how that could happen when using Open MPI since the hwloc versions on both sides

Re: [OMPI devel] about r32685

2014-09-09 Thread Brice Goglin

Gilles, The strange configure check comes from this commit https://github.com/open-mpi/hwloc/commit/6a9299ce9d1cb1c13b3b346fe6fdfed2df75c672 Are you sure your patch won't break something else? I'll ask Pavan what he thinks about your patch. I agree that it's crazy we don't find strncasecmp on som

Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-28 Thread Brice Goglin

Not sure about the details either, but ppc64le support was only included in libtool recently (will be in the next release). I guess ppcle support is only becoming a reality now, and it wasn't widely usable in the past. Brice Le 28/04/2014 17:17, George Bosilca a écrit : > I’m not sure how to in

Re: [OMPI devel] V1.8 Coverity scan results

2014-04-05 Thread Brice Goglin

Hello Ralph, I took care of the defects under opal/mca/hwloc/hwloc172. Nothing important there (a memory leak in some deprecated code that is likely unused today). But I also updated hwloc's v1.7 branch with all recent fixes from more recent branches. You may want to update OMPI's copy. At least y

[OMPI devel] udcm_component_query hangs when memlock not infinite

2014-02-20 Thread Brice Goglin

Hello, We're setting up a new cluster here. Open MPI 1.7.4 was hanging at startup without any error message. The issue appears to be udcm_component_query() hanging in finalize() on the sched_yield() loop when memlock limit isn't set to unlimited as usual. Unfortunately the hangs occur before we p

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29633 - in trunk: . contrib contrib/dist/linux contrib/dist/linux/debian contrib/dist/linux/debian/source

2013-11-07 Thread Brice Goglin

OFED is already in Debian as far as I know. At least Roland Dreier takes care of uploading some IB-related packages. And I didn't have any problem getting Mellanox IB to work on Debian in the last years, but I haven't played with Mellanox custom APIs. Brice Le 07/11/2013 20:27, Mike Dubman a éc

Re: [OMPI devel] debian/ directory

2013-11-06 Thread Brice Goglin

http://anonscm.debian.org/viewvc/pkg-openmpi/openmpi/ svn://svn.debian.org/svn/pkg-openmpi/openmpi/ FWIW, hwloc debian packaging is maintained by one of the upstream devs, but he didn't have to pollute the upstream hwloc repo with debian stuff. There's a different repo with only the debian subdire

Re: [OMPI devel] Annual OMPI membership review: SVN accounts

2013-07-09 Thread Brice Goglin

Le 09/07/2013 00:32, Jeff Squyres (jsquyres) a écrit : > INRIA > > bgoglin: Brice Goglin > arougier: Antoine Rougier > sthibaul: Samuel Thibault > mercier: Guillaume Mercier **NO COMMITS IN LAST YEAR** > nfurmento:Nathalie Furmento **NO COMMITS IN LAST > Y

Re: [OMPI devel] Any plans to support Intel MIC (Xeon Phi) in Open-MPI?

2013-05-03 Thread Brice Goglin

Le 03/05/2013 02:47, Ralph Castain a écrit : > Brice: do the Phis appear in the hwloc topology object? Yes, on Linux, you will see something like this in lstopo v1.7: HostBridge L#0 PCIBridge PCI 8086:225c CoProc L#2 "mic0" And these contain some attributes saying how many c

Re: [OMPI devel] hwloc using libpci: GPL issue

2013-02-06 Thread Brice Goglin

Do you already use hwloc's PCI objects in OMPI v1.7 ? Brice Le 06/02/2013 15:39, Jeff Squyres (jsquyres) a écrit : > BEFORE YOU PANIC: this only affects Open MPI v1.7 (which is not yet released) > and the OMPI SVN trunk (which is also, obviously, not released). ***OMPI > v1.6.x is unaffecte

Re: [OMPI devel] MX BTL segfaults

2012-10-25 Thread Brice Goglin

Le 25/10/2012 23:56, Barrett, Brian W a écrit : > Hi all - > > The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my > mpool change in r27485). I'm not really interested in fixing it; the > problem does not occur with the MX MTL. Does anyone else have interest in > fixing it?

Re: [OMPI devel] git tree mirror: I give up :-(

2012-10-05 Thread Brice Goglin

Le 05/10/2012 10:35, Bert Wesarg a écrit : > On 10/04/2012 03:41 PM, Jeff Squyres wrote: >> It would probably be better to ask one of the other git-interested people. >> >> Bert? Brice? Nathan? >> >> Can you check that the git mirror appears to be functioning properly? > Just tried it and bootstr

Re: [OMPI devel] git tree mirror: I give up :-(

2012-10-04 Thread Brice Goglin

I don't see any git-svn line in the commit messages, those are very helpful when one want to see a specific commits using its SVN revision. Brice Le 04/10/2012 15:41, Jeff Squyres a écrit : > It would probably be better to ask one of the other git-interested people. > > Bert? Brice? Nathan? >

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Brice Goglin

will bind to all the HT's in a core and/or socket. > > Are you using Linux cgroups/cpusets to restrict available cores? > Because Brice is saying that E5-2650 is supposed to have more cores. > > > > On Wed, May 30, 2012 at 4:36 PM, Brice Goglin > mailto:

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Brice Goglin

Your /proc/cpuinfo output (filtered below) looks like only two sockets (physical ids 0 and 1), with one core each (cpu cores=1, core id=0), with hyperthreading (siblings=2). So lstopo looks good. E5-2650 is supposed to have 8 cores. I assume you use Linux cgroups/cpusets to restrict the available c

Re: [OMPI devel] 1.6 NEWS bullets

2012-05-02 Thread Brice Goglin

Le 02/05/2012 15:00, Jeff Squyres a écrit : > Here's what I've put for the 1.6 NEWS bullets -- do they look ok? > > - Fix some process affinity issues. When binding a process, Open MPI > will now bind to all available hyperthreads in a core (or socket, > depending on the binding options specif

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin

Le 22/02/2012 20:24, Eugene Loh a écrit : > On 2/22/2012 11:08 AM, Ralph Castain wrote: >> On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote: >>> Le 22/02/2012 17:48, Ralph Castain a écrit : >>>> On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote >>>>

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin

rm, hwloc finds no socket level >>> *) therefore hwloc returns num_sockets==0 to OMPI >>> *) OMPI divides by 0 and barfs on basically everything >> Okay. So, Brice's other e-mail indicates that the first two are "not really >> uncommon": >>

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin

Le 22/02/2012 07:36, Eugene Loh a écrit : > On 2/21/2012 5:40 PM, Paul H. Hargrove wrote: >> Here are the first of the results of the testing I promised. >> I am not 100% sure how to reach the code that Eugene reported as >> problematic, > I don't think you're going to see it. Somehow, hwloc on th

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Brice Goglin

Le 17/02/2012 14:59, Jeff Squyres a écrit : > On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote: > >>> I didn't follow this entire thread in details, but I am feeling that >>> something is wrong here. The flag fixes your problem indeed, but I think it >>> may break binding too. It's basically maki

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Brice Goglin

Le 16/02/2012 14:16, nadia.der...@bull.net a écrit : > Hi Jeff, > > Sorry for the delay, but my victim with 2 ib devices had been stolen ;-) > > So, I ported the patch on the v1.5 branch and finally could test it. > > Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had > to set >

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Brice Goglin

c-bind pu:0 ./all2all : -np 1 hwloc-bind pu:1 ./all2all Then, we'll see if you can get the same result with one of OMPI binding options. Brice > Matthias > > On Thursday 16 February 2012 15:46:46 Brice Goglin wrote: >> Le 16/02/2012 15:39, Matthias Jurenz a écrit : >&g

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Brice Goglin

Le 16/02/2012 15:39, Matthias Jurenz a écrit : > Here the output of lstopo from a single compute node. I'm wondering that the > fact of L1/L2 sharing isn't visible - also not in the graphical output... That's a kernel bug. We're waiting for AMD to tell the kernel that L1i and L2 are shared across

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

IO > device of a certain type. We wound up having to write "summarizer" > code to parse the hwloc tree into a more OMPI-usable form, so we can > always do that with the IO tree as well if necessary. > > > On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote: > >> That d

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

Le 09/02/2012 14:00, Ralph Castain a écrit : > There is another aspect, though - I had missed it in the thread, but the > question Nadia was addressing is: how to tell I am bound? The way we > currently do it is to compare our cpuset against the local cpuset - if we are > on a subset, then we kn

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

et isn't NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj() to do that. Brice Le 09/02/2012 14:34, Ralph Castain a écrit : > Ah, okay - in that case, having the I/O device attached to the > "closest" object at each depth would be ideal from an OMPI perspective. > > On Feb 9, 20

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

The bios usually tells you which numa location is close to each host-to-pci bridge. So the answer is yes. Brice Ralph Castain a écrit : I'm not sure I understand this comment. A PCI device is attached to the node, not to any specific location within the node, isn't it? Can you really say that

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

Jeff Squyres a écrit : >On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote: > >>> Just so that I understand this better -- if a process is bound in a >>> cpuset, will tools like hwloc's lstopo only show the Linux >>> processors *in that cpuset*? I.e., does it not have any >>> visibility of the pr

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin

By default, hwloc only shows what's inside the current cpuset. There's an option to show everything instead (topology flag). Brice Le 09/02/2012 12:18, Jeff Squyres a écrit : > Just so that I understand this better -- if a process is bound in a cpuset, > will tools like hwloc's lstopo only sho

Re: [OMPI devel] 1.5.5rc1 is out

2011-12-14 Thread Brice Goglin

And a hwloc problem with very old sched_setaffinity on redhat 8, we're looking at it. Brice Le 14/12/2011 11:14, Paul H. Hargrove a écrit : > Summary of my 1.5.5rc1 testing findings: > > + generated config.h in tarball breaks hwloc on non-linux platforms: > http://www.open-mpi.org/community/list

Re: [OMPI devel] 1.5.5rc1 tested: MacOS 10.4 x86-64 hwloc build failure

2011-12-14 Thread Brice Goglin

Le 14/12/2011 08:29, Paul H. Hargrove a écrit : > I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the > same hwloc issue I've encountered on {Free,Open,Net}BSD. > The build fails with >> CCLD opal_wrapper >> /usr/bin/ld: Undefined symbols: >> _opal_hwloc122_hwloc_backend_sysfs_e

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Brice Goglin

Le 14/12/2011 08:01, Paul H. Hargrove a écrit : > I cannot even *build* OpenMPI on {Free,Open,Net}BSD systems unless I > configure with --without-hwloc. > Thus I cannot agree w/ Brice's suggestion that I ignore this warning. Please try building hwloc (1.2.2 if you want the same one as OMPI current

Re: [OMPI devel] 1.5.5rc1 tested: hwloc build failure on Red Hat Linux 8

2011-12-14 Thread Brice Goglin

Le 14/12/2011 07:12, Paul H. Hargrove a écrit : > I cannot hwloc in build 1.5.5rc1 on the following system: > > System 2: Linux/x86 >> $ cat /etc/redhat-release >> Red Hat Linux release 8.0 (Psyche) >> $ uname -a >> Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10 EDT 2009 >> i686 i6

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Brice Goglin

Le 14/12/2011 07:17, Paul H. Hargrove a écrit : > My OpenBSD and NetBSD testers have the same behavior, but now I see > that I was at warned... > > On all the affected systems I found the following (modulo the system > tuple) in the configure output: >> checking which OS support to include... Unsup

Re: [OMPI devel] known limitation or bug in hwloc?

2011-08-29 Thread Brice Goglin

I am playing with those aspects right now (it's planned for hwloc v1.4). hwloc (even the 1.2 currently in OMPI) can already support topology containing different machines, but there's no easy/automatic way to agregate multiple machine topologies into a single global one. The important thing to unde

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-05 Thread Brice Goglin

Le 04/08/2011 02:24, Jeff Squyres a écrit : > Libtool's -all-static flag probably resolves to some gcc flag(s), right? Can > you just pass those in via CFLAGS / LDFLAGS to configure and then not pass > anything in via make? I only see an additional -static flag on the final program-link gcc com

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Brice Goglin

Le 03/08/2011 20:37, Jeff Squyres a écrit : > > Shouldn't you pass the same LDFLAGS to configure as to make? I'd be happy if it worked :) Actually, I'd be even more happy if Pavel didn't have to do this to build a fully-static orted. > I.e., if you tell configure "configure it way" but then yo

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Brice Goglin

Le 03/08/2011 18:24, Shamis, Pavel a écrit : > Hw-loc vanilla works, because static mode does not build the binaries > in static mode. If you would try to build build hwloc utilities in > static mode it fails , just like ompi. I get static binaries on SLES11 with ./configure --enable-static --

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Brice Goglin

Le 03/08/2011 16:47, Jeff Squyres a écrit : > Err.. I don't quite understand. How exactly are you configuring? If I > do this: He's using contrib/platform/ornl/ornl_configure_self_contained I reproduced here on SLES10 with ./configure --enable-static --disable-shared --with-wrapper-ldflags=-s

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Brice Goglin

I finally reproduced here. Based on the ornl platform script, you're configuring with LDFLAGS=-static and then building with make LDFLAGS=-all-static. Surprisingly, this works fine when building vanilla hwloc, but it breaks inside OMPI. The reason is that OMPI doesn't pass LDFLAGS=-static to hwloc'

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-07-26 Thread Brice Goglin

Hello Pavel, Do you have libnuma headers and dynamic lib installed without static lib installed ? Which distro is this? Brice Le 25/07/2011 23:56, Shamis, Pavel a écrit : > Hello, > > I have been trying to compile Open MPI (trunk) static version with hwloc, the > last is enabled by default in t

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin

hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a CUDA device, see https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h Do you need anything else ? Brice Le 14/04/2011 17:44,

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin

Le 14/04/2011 17:58, George Bosilca a écrit : > On Apr 13, 2011, at 20:07 , Ken Lloyd wrote: > > >> George, Yes. GPUDirect eliminated an additional (host) memory buffering step >> between the HCA and the GPU that took CPU cycles. >> > If this is the case then why do we need to use special

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Brice Goglin

Hello Rolf, This "CUDA device memory" isn't memory mapped in the host, right? Then what does its address look like ? When you say "when it is detected that a buffer is CUDA device memory", if the actual device and host address spaces are different, how do you know that device addresses and usual h

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Brice Goglin

Le 08/09/2010 14:02, Jeff Squyres a écrit : > On Sep 3, 2010, at 3:38 PM, George Bosilca wrote: > > >> However, going over the existing BTLs I can see that some BTLs do not >> correctly set this value: >> >> BTL BandwidthAuto-detect Status >> Elan2000NO

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-04 Thread Brice Goglin

Le 03/09/2010 17:33, George Bosilca a écrit : >>> GM 250 NO Doubtful >>> This one should be 2000 (assuming nobody runs Myrinet 1280 from the 90s anymore :)) >>> MX 2000/1 YES (Mbs)Correct (before the patch) >>> OFUD800

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin

Le 03/09/2010 15:38, George Bosilca a écrit : > Jeff, > > I think you will have to revert this patch as the btl_bandwidth __IS__ > supposed to be in Mbs and not MBs. We usually talk about networks in Mbs > (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the > original design of

[OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin

detection anyway?). Signed-off-by: Brice Goglin Index: ompi/mca/btl/mx/btl_mx_component.c === --- ompi/mca/btl/mx/btl_mx_component.c (révision 23711) +++ ompi/mca/btl/mx/btl_mx_component.c (copie de travail) @@ -15

Re: [OMPI devel] knem_dma_min

2010-08-18 Thread Brice Goglin

Le 18/08/2010 19:21, Eugene Loh a écrit : > Eugene Loh wrote: > >> In mca_btl_sm_get_sync(), I see this: >> /* Use the DMA flag if knem supports it *and* the segment length >>is greater than the cutoff. Note that if the knem_dma_min >>value is 0 (i.e., the MCA param was set

Re: [OMPI devel] 0.9.1rc2 is available

2009-10-22 Thread Brice Goglin

Ashley Pittman wrote: >> [csamuel@tango069 ~]$ ~/local/hwloc/0.9.1rc2/bin/lstopo >> System(31GB) >> Node#0(15GB) + Socket#0 + L3(6144KB) + L2(512KB) + L1(64KB) + Core#0 + P#0 >> Node#1(16GB) + Socket#1 + L3(6144KB) >> L2(512KB) + L1(64KB) + Core#0 + P#4 >> L2(512KB) + L1(64KB) + Core#1

Re: [OMPI devel] why mx_forget in mca_btl_mx_prepare_dst?

2009-10-21 Thread Brice Goglin

George Bosilca wrote: > On Oct 21, 2009, at 13:42 , Scott Atchley wrote: >> On Oct 21, 2009, at 1:25 PM, George Bosilca wrote: >>> Because MX doesn't provide a real RMA protocol, we created a fake >>> one on top of point-to-point. The two peers have to agree on a >>> unique tag, then the receiver p

[OMPI devel] why mx_forget in mca_btl_mx_prepare_dst?

2009-10-21 Thread Brice Goglin

Hello, I am debugging a crash with OMPI 1.3.3 BTL over Open-MX. It's crashing will trying to store incoming data in the OMPI receive buffer, but OMPI seems to have already freed the buffer even if the MX request is not complete yet. It looks like this is caused by mca_btl_mx_prepare_dst() posting

Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin

Jeff Squyres wrote: > Do you just want to wait for the ummunotify stuff in OMPI? I'm half > done making a merged "linux" memory component (i.e., it merges the > ptmalloc2 component with the new ummunotify stuff). > > It won't help for kernels <2.6.32, of course. :-) Yeah that's another solution

Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin

Jeff Squyres wrote: > On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote: > >> I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI >> cleanup the Open-MX regcache when needed. It causes some deadlocks since >> OpenMPI intercepts Open-MX' own free() ca

[OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin

Hello, I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI cleanup the Open-MX regcache when needed. It causes some deadlocks since OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to have Open-MX free/munmap calls not invoke OpenMPI interception hooks? Or is

Re: [OMPI devel] connect management for multirail (Open-)MX

2009-06-20 Thread Brice Goglin

George Bosilca wrote: > Yes, in Open MPI the connections are usually created on demand. As far > as I know there are few devices that do not abide to this "law", but > MX is not one of them. > > To be more precise on how the connections are established, if we say > that each node has two rails and

Re: [OMPI devel] connect management for multirail (Open-)MX

2009-06-17 Thread Brice Goglin

try to connect > the second device (rail in this context). In MX this works because we > use the blocking function (mx_connect). > > george. > > On Jun 17, 2009, at 08:23 , Brice Goglin wrote: > >> Hello, >> >> I am debugging some sort of deadlock when doin

1 2 >

1 - 100 of 103 matches

Mail list logo