Hello
This seems to be the code for sharing the hwloc topology in shared
memory. This whole thing was designed for very large virtual spaces
(>=48bits on most CPUs) where it's easy to find a virtual memory area
that is unused in all participating processes. I am not sure it's worth
fixing on
Hello Ben
It will be back, at least for the majority of platforms (those without
heterogeneous memory).
See https://github.com/open-mpi/ompi/issues/8170 and
https://github.com/openpmix/prrte/pull/1141
Brice
Le 11/11/2021 à 05:33, Ben Menadue via devel a écrit :
Hi,
Quick question: what
begs the question: how does a library detect that the shmem
>>> region has already been mapped? If we attempt to map it and fail, does that
>>> mean it has already been mapped or that it doesn't exist?
>>>
>>> It isn't reasonable to expect that all t
Hello Ralph
One thing that isn't clear in this document : the hwloc shmem region may
only be mapped *once* per process (because the mmap address is always
the same). Hence, if a library calls adopt() in the process, others will
fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC
top
FYI, this was a git bug that will be fixed soon (the range of commits
being rebased was wrong).
https://lore.kernel.org/git/pull.789.git.1605314085.gitgitgad...@gmail.com/T/#t
https://lore.kernel.org/git/20d6104d-ca02-4ce4-a1c0-2f9386ded...@gmail.com/T/#t
Brice
Le 07/02/2020 à 10:27, Brice
Hello
I have a git submodule issue that I don't understand.
PR#7367 was initially on top of PR #7366. When Jeff merged PR#7366, I
rebased my #7367 with git prrs and got this error:
$ git prrs origin master
>From https://github.com/open-mpi/ompi
* branch master -> FETCH_HEAD
Thanks a lot for writing all this.
At the end
https://github.com/open-mpi/ompi/wiki/GitSubmodules#adding-a-new-submodule-pointing-to-a-specific-commit
should "bar" be "bar50x" in line "$ git add bar" ?
It seems to me that you are in opal/mca/foo and the new submodule is in
"bar50x" (according to
Hello Jeff
Looks like I am not allowed to modify the page but I'll be at the meeting ;)
Brice
Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit :
> Gentle reminder to please sign up for the face-to-face meeting and add your
> items to the wiki:
>
> https://github.com/open-m
I just pushed my patches rebased on master + update to hwloc 2.0.1 to
bgoglin/ompi (master branch).
My testing of mapping/ranking/binding looks good here (on dual xeon with
CoD, 2 sockets x 2 NUMA x 6 cores).
It'd be nice if somebody else could test on another platform with
different options and/
Sorry guys, I think I have all patches ready since the F2F meeting, but
I couldn't test them enough because ranking was broken. I'll work on
that by next week.
Brie
Le 22/05/2018 à 17:50, r...@open-mpi.org a écrit :
> I’ve been running with hwloc 2.0.1 for quite some time now without problem,
Hello
Two hwloc issues are listed in this week telcon:
"hwloc2 WIP, may need help with."
https://github.com/open-mpi/ompi/pull/4677
* Is this really a 3.0.1 thing? I thought hwloc2 was only for 3.1+
* As I replied in this PR, I have some patches but I need help for
testing them. Can you list some
Le 20/12/2017 à 22:01, Howard Pritchard a écrit :
>
> I can think of several ways to fix it. Easiest would be to modify the
>
> opal/mca/hwloc/hwloc2a/configure.m4
>
> to not set --enable-cuda if --with-cuda is evaluated to something
> other than yes.
>
>
> Optionally, I could fix the hwloc config
Looks like you're using a hwloc < 1.11. If you want to support this old
API while using the 1.11 names, you can add this to OMPI after #include
#if HWLOC_API_VERSION < 0x00010b00
#define HWLOC_OBJ_NUMANODE HWLOC_OBJ_NODE
#define HWLOC_OBJ_PACKAGE HWLOC_OBJ_SOCKET
#endif
Brice
Le 04/10/2017 19
Hello
This message is related to /var/cache/hwloc/knl_memoryside_cache. This
file exposes the KNL cluster and MCDRAM configuration, only accessible
from root-only files. hwloc-dump-hwdata runs at boot time to create that
file, and non-root hwloc users can read it later. Failing to read that
file b
You can email scan-ad...@coverity.com to report bugs and/or ask what's
going on.
Brice
Le 16/06/2017 07:12, Gilles Gouaillardet a écrit :
> Ralph,
>
>
> my 0.02 US$
>
>
> i noted the error message mentions 'holding lock
> "pmix_mutex_t.m_lock_pthread"', but it does not explicitly mentions
>
> '
Hello
Did anybody start porting OMPI to the new hwloc 2.0 API (currently in
hwloc git master)?
Gilles, I seem to remember you were interested a while ago?
I will have to do it in the near future. If anybody already started that
work, please let me know.
Brice
___
Le 05/01/2017 07:07, Gilles Gouaillardet a écrit :
> Brice,
>
> things would be much easier if there were an HWLOC_OBJ_NODE object in
> the topology.
>
> could you please consider backporting the relevant changes from master
> into the v1.11 branch ?
>
> Cheers,
>
> Gilles
Hello
Unfortunately, I
s/June 2016/June 2006/ :)
Anyway, it ended on July 31st based on https://www.suse.com/lifecycle/
Brice
Le 29/08/2016 16:03, Gilles Gouaillardet a écrit :
> According to wikipedia, SLES 10 was released on June 2016, and is
> supported for 10 years.
> (SLES 12 is supported for 13 years, and I ho
Yes, kill all netloc lists.
Brice
Le 18 juillet 2016 17:43:49 UTC+02:00, Josh Hursey a
écrit :
>Now that netloc has rolled into hwloc, I think it is safe to kill the
>netloc lists.
>
>mtt-devel-core and mtt-annouce should be kept. They probably need to be
>cleaned. But the hope is that we relea
Thanks, applied to hwloc. And PR for OMPI master at
https://github.com/open-mpi/ompi/pull/1657
Brice
Le 06/05/2016 00:29, Paul Hargrove a écrit :
> I have some good news: I have a fix!!
>
> FWIW: I too can build w/ xlc 12.1 (also BG/Q).
> It is just the 13.1.0 on Power7 that crashes building hw
Thanks
I think I would be fine with that fix. Unfortunately I won't have a good
internet access until sunday night. I won't be able to test anything
properly earlier :/
Le 06/05/2016 00:29, Paul Hargrove a écrit :
> I have some good news: I have a fix!!
>
> FWIW: I too can build w/ xlc 12.1 (al
https://github.com/open-mpi/ompi/pull/1621 (against master, needs to go
to 2.0 later)
Le 03/05/2016 08:22, Brice Goglin a écrit :
> Yes we should backport this to OMPI master and v2.x.
> I am usually not the one doing the PR, I'd need to learn the exact
> procedure first :)
>
&
> 1.11.2, and into v2.x in particular?
> Or perhaps that is Jeff's job?
>
> -Paul
>
> On Mon, May 2, 2016 at 11:04 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Should be fixed by
>
> https://github.com/open-mpi/hwloc/commit/9549f
Should be fixed by
https://github.com/open-mpi/hwloc/commit/9549fd59af04dca2e2340e17f0e685f8c552d818
Thanks for the report
Brice
Le 02/05/2016 21:53, Paul Hargrove a écrit :
> I have a linux/ppc64 host running Fedora 20.
> I have configured the 2.0.0rc2 tarball with
>
> --prefix=[] --ena
It comes from the hwloc API. It doesn't use integers because some users
want to provide their own distance matrix that was generated by
benchmarks. Also we normalize the matrix to have latency 1 on the
diagonal (for local memory access latency ) and that causes non-diagonal
items not to be integers
Hello
hwloc doesn't have any cuda specific configure variables. We just use
standard variables like LIBS and CPPFLAGS. I guess OMPI could propagate
--with-cuda directories to hwloc by setting LIBS and CPPFLAGS before
running hwloc m4 functions, but I don't think OMPI actually cares about
hwloc repo
Sorry, I didn't see this report before the pull request.
I applied Gilles' "simple but arguable" fix to master and stable
branches up to v1.9. It could be too imperfect if somebody ever changes
to permissions of /devices/pci* but I guess that's not going to happen
in practice. Finding the right de
ther nodes of the same cluster ...
>
> George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Did it work on the same machine before? Or did OMPI enable hwloc's
> PCI discovery recently?
>
> Does lstop
; lstopo complains with the same assert. Interestingly enough, the same
> binary succeed on the other nodes of the same cluster ...
>
> George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Did it work on the sa
Did it work on the same machine before? Or did OMPI enable hwloc's PCI
discovery recently?
Does lstopo complain the same?
Brice
Le 10/09/2015 21:10, George Bosilca a écrit :
> With the current trunk version I keep getting an assert deep down in
> orted.
>
> orted:
> ../../../../../../../ompi/op
Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
> Ralph,
>
> just to be clear, your proposal is to abort if openmpi is configured
> with --without-hwloc, right ?
> ( the --with-hwloc option is not removed because we want to keep the
> option of using an external hwloc library )
>
> if I understa
Le 01/09/2015 15:59, marcin.krotkiewski a écrit :
> Dear Rolf and Brice,
>
> Thank you very much for your help. I have now moved the 'dubious' IB
> card from Slot 1 to Slot 5. It is now reported by hwloc as bound to a
> separate NUMA node. In this case OpenMPI works as could be expected:
>
> - NUM
as a
> floating point number ?
>
> i remember i had to fix a bug in ompi a while ago
> /* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */
>
> Cheers,
>
> Gilles
>
> On 9/1/2015 5:28 AM, Brice Goglin wrote:
>> The locality is mlx4_0 as reported by l
The locality is mlx4_0 as reported by lstopo is "near the entire
machine" (while mlx4_1 is reported near NUMA node #3). I would vote for
buggy PCI-NUMA affinity being reported by the BIOS. But I am not very
familiar with 4x E5-4600 machines so please make sure this PCI slot is
really attached to a
Le 25/08/2015 05:59, Christopher Samuel a écrit :
>
> INRIA does have Open-MX (Myrinet Express over Generic Ethernet
> Hardware), last release December 2014. No idea if it's still developed
> or used..
>
> http://open-mx.gforge.inria.fr/
>
> Brice?
>
> Open-MPI is listed as working with it there.
It was renamed from cpuid.h to cpuid-x86.h at some point. Can't check from here
but the actual code should be the same in all these branches.
Brice
Le 31 juillet 2015 22:19:47 UTC+02:00, Ralph Castain a
écrit :
>Yo Paul
>
>1.8.8 and 1.10 do not have hwloc-1.11 in them - they remain on
>hwloc-1
Le 24/03/2015 20:47, Jeff Squyres (jsquyres) a écrit :
> I talked to Peter off-list.
>
> We got a successful build going for him.
>
> Seems like we've identified a few issues here, though:
>
> 1. ./configure with gcc 4.7.2 on Debian (I didn't catch the precise version
> of Debian) results in a Lhw
Le 17/12/2014 21:43, Paul Hargrove a écrit :
>
> Dbx gives me
>
> t@1 (l@1) terminated by signal SEGV (no mapping at the fault address)
> Current function is opal_hwloc172_hwloc_get_obj_by_depth
>74 return topology->levels[depth][idx];
> (dbx) where
> current thread: t@1
Le 12/12/2014 07:36, Gilles Gouaillardet a écrit :
> Brice,
>
> ompi master is based on hwloc 1.9.1, isn't it ?
Yes sorry, I am often confused by all these OMPI vs hwloc branch numbers.
>
> if some backport is required for hwloc 1.7.2 (used by ompi v1.8), then
> could you please update the hwloc
ld this fix be backported to both master and v1.8 ?
>
> Cheers,
>
> Gilles
>
> On 2014/12/12 7:46, Brice Goglin wrote:
>> This problem was fixed in hwloc upstream recently.
>>
>> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e
>
ldn’t leaf thru your output well enough to see all the lstopo
>>>>>>> versions, but you might check to ensure they are the same.
>>>>>>>
>>>>>>> Looking at the code base, you may also hit a problem here. OMPI 1.6
>>>>>&g
a big change. OMPI 1.8 series is based on
>>>>>> hwloc 1.9, so at least that is closer (though probably still a mismatch).
>>>>>>
>>>>>> Frankly, I’d just download and install an OMPI tarball myself and avoid
>>>>>> these heada
>> about 9/10 cases but works without warnings in 1/10 cases. I attached the
>>> output (with xml) for both the working and `broken` case. Note that the xml
>>> is of course printed (differently) multiple times for each task/core. As
>>> always, any help would be
Hello
The github issue you're refering to was closed 18 months ago. The
warning (it's not an error) is only supposed to appear if you're
importing in a recent hwloc a XML that was exported from a old hwloc. I
don't see how that could happen when using Open MPI since the hwloc
versions on both sides
Gilles,
The strange configure check comes from this commit
https://github.com/open-mpi/hwloc/commit/6a9299ce9d1cb1c13b3b346fe6fdfed2df75c672
Are you sure your patch won't break something else?
I'll ask Pavan what he thinks about your patch.
I agree that it's crazy we don't find strncasecmp on som
Not sure about the details either, but ppc64le support was only included
in libtool recently (will be in the next release). I guess ppcle support
is only becoming a reality now, and it wasn't widely usable in the past.
Brice
Le 28/04/2014 17:17, George Bosilca a écrit :
> I’m not sure how to in
Hello Ralph,
I took care of the defects under opal/mca/hwloc/hwloc172. Nothing
important there (a memory leak in some deprecated code that is likely
unused today). But I also updated hwloc's v1.7 branch with all recent
fixes from more recent branches. You may want to update OMPI's copy.
At least y
Hello,
We're setting up a new cluster here. Open MPI 1.7.4 was hanging at
startup without any error message. The issue appears to be
udcm_component_query() hanging in finalize() on the sched_yield() loop
when memlock limit isn't set to unlimited as usual.
Unfortunately the hangs occur before we p
OFED is already in Debian as far as I know. At least Roland Dreier takes
care of uploading some IB-related packages. And I didn't have any
problem getting Mellanox IB to work on Debian in the last years, but I
haven't played with Mellanox custom APIs.
Brice
Le 07/11/2013 20:27, Mike Dubman a éc
http://anonscm.debian.org/viewvc/pkg-openmpi/openmpi/
svn://svn.debian.org/svn/pkg-openmpi/openmpi/
FWIW, hwloc debian packaging is maintained by one of the upstream devs,
but he didn't have to pollute the upstream hwloc repo with debian stuff.
There's a different repo with only the debian subdire
Le 09/07/2013 00:32, Jeff Squyres (jsquyres) a écrit :
> INRIA
>
> bgoglin: Brice Goglin
> arougier: Antoine Rougier
> sthibaul: Samuel Thibault
> mercier: Guillaume Mercier **NO COMMITS IN LAST YEAR**
> nfurmento:Nathalie Furmento **NO COMMITS IN LAST
> Y
Le 03/05/2013 02:47, Ralph Castain a écrit :
> Brice: do the Phis appear in the hwloc topology object?
Yes, on Linux, you will see something like this in lstopo v1.7:
HostBridge L#0
PCIBridge
PCI 8086:225c
CoProc L#2 "mic0"
And these contain some attributes saying how many c
Do you already use hwloc's PCI objects in OMPI v1.7 ?
Brice
Le 06/02/2013 15:39, Jeff Squyres (jsquyres) a écrit :
> BEFORE YOU PANIC: this only affects Open MPI v1.7 (which is not yet released)
> and the OMPI SVN trunk (which is also, obviously, not released). ***OMPI
> v1.6.x is unaffecte
Le 25/10/2012 23:56, Barrett, Brian W a écrit :
> Hi all -
>
> The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my
> mpool change in r27485). I'm not really interested in fixing it; the
> problem does not occur with the MX MTL. Does anyone else have interest in
> fixing it?
Le 05/10/2012 10:35, Bert Wesarg a écrit :
> On 10/04/2012 03:41 PM, Jeff Squyres wrote:
>> It would probably be better to ask one of the other git-interested people.
>>
>> Bert? Brice? Nathan?
>>
>> Can you check that the git mirror appears to be functioning properly?
> Just tried it and bootstr
I don't see any git-svn line in the commit messages, those are very
helpful when one want to see a specific commits using its SVN revision.
Brice
Le 04/10/2012 15:41, Jeff Squyres a écrit :
> It would probably be better to ask one of the other git-interested people.
>
> Bert? Brice? Nathan?
>
will bind to all the HT's in a core and/or socket.
>
> Are you using Linux cgroups/cpusets to restrict available cores?
> Because Brice is saying that E5-2650 is supposed to have more cores.
>
>
> > On Wed, May 30, 2012 at 4:36 PM, Brice Goglin
> mailto:
Your /proc/cpuinfo output (filtered below) looks like only two sockets
(physical ids 0 and 1), with one core each (cpu cores=1, core id=0),
with hyperthreading (siblings=2). So lstopo looks good.
E5-2650 is supposed to have 8 cores. I assume you use Linux
cgroups/cpusets to restrict the available c
Le 02/05/2012 15:00, Jeff Squyres a écrit :
> Here's what I've put for the 1.6 NEWS bullets -- do they look ok?
>
> - Fix some process affinity issues. When binding a process, Open MPI
> will now bind to all available hyperthreads in a core (or socket,
> depending on the binding options specif
Le 22/02/2012 20:24, Eugene Loh a écrit :
> On 2/22/2012 11:08 AM, Ralph Castain wrote:
>> On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote:
>>> Le 22/02/2012 17:48, Ralph Castain a écrit :
>>>> On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote
>>>>
rm, hwloc finds no socket level
>>> *) therefore hwloc returns num_sockets==0 to OMPI
>>> *) OMPI divides by 0 and barfs on basically everything
>> Okay. So, Brice's other e-mail indicates that the first two are "not really
>> uncommon":
>>
Le 22/02/2012 07:36, Eugene Loh a écrit :
> On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:
>> Here are the first of the results of the testing I promised.
>> I am not 100% sure how to reach the code that Eugene reported as
>> problematic,
> I don't think you're going to see it. Somehow, hwloc on th
Le 17/02/2012 14:59, Jeff Squyres a écrit :
> On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote:
>
>>> I didn't follow this entire thread in details, but I am feeling that
>>> something is wrong here. The flag fixes your problem indeed, but I think it
>>> may break binding too. It's basically maki
Le 16/02/2012 14:16, nadia.der...@bull.net a écrit :
> Hi Jeff,
>
> Sorry for the delay, but my victim with 2 ib devices had been stolen ;-)
>
> So, I ported the patch on the v1.5 branch and finally could test it.
>
> Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had
> to set
>
c-bind pu:0 ./all2all : -np 1 hwloc-bind pu:1
./all2all
Then, we'll see if you can get the same result with one of OMPI binding
options.
Brice
> Matthias
>
> On Thursday 16 February 2012 15:46:46 Brice Goglin wrote:
>> Le 16/02/2012 15:39, Matthias Jurenz a écrit :
>&g
Le 16/02/2012 15:39, Matthias Jurenz a écrit :
> Here the output of lstopo from a single compute node. I'm wondering that the
> fact of L1/L2 sharing isn't visible - also not in the graphical output...
That's a kernel bug. We're waiting for AMD to tell the kernel that L1i
and L2 are shared across
IO
> device of a certain type. We wound up having to write "summarizer"
> code to parse the hwloc tree into a more OMPI-usable form, so we can
> always do that with the IO tree as well if necessary.
>
>
> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>
>> That d
Le 09/02/2012 14:00, Ralph Castain a écrit :
> There is another aspect, though - I had missed it in the thread, but the
> question Nadia was addressing is: how to tell I am bound? The way we
> currently do it is to compare our cpuset against the local cpuset - if we are
> on a subset, then we kn
et isn't
NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj() to do that.
Brice
Le 09/02/2012 14:34, Ralph Castain a écrit :
> Ah, okay - in that case, having the I/O device attached to the
> "closest" object at each depth would be ideal from an OMPI perspective.
>
> On Feb 9, 20
The bios usually tells you which numa location is close to each host-to-pci
bridge. So the answer is yes.
Brice
Ralph Castain a écrit :
I'm not sure I understand this comment. A PCI device is attached to the node,
not to any specific location within the node, isn't it? Can you really say that
Jeff Squyres a écrit :
>On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote:
>
>>> Just so that I understand this better -- if a process is bound in a
>>> cpuset, will tools like hwloc's lstopo only show the Linux
>>> processors *in that cpuset*? I.e., does it not have any
>>> visibility of the pr
By default, hwloc only shows what's inside the current cpuset. There's
an option to show everything instead (topology flag).
Brice
Le 09/02/2012 12:18, Jeff Squyres a écrit :
> Just so that I understand this better -- if a process is bound in a cpuset,
> will tools like hwloc's lstopo only sho
And a hwloc problem with very old sched_setaffinity on redhat 8, we're
looking at it.
Brice
Le 14/12/2011 11:14, Paul H. Hargrove a écrit :
> Summary of my 1.5.5rc1 testing findings:
>
> + generated config.h in tarball breaks hwloc on non-linux platforms:
> http://www.open-mpi.org/community/list
Le 14/12/2011 08:29, Paul H. Hargrove a écrit :
> I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the
> same hwloc issue I've encountered on {Free,Open,Net}BSD.
> The build fails with
>> CCLD opal_wrapper
>> /usr/bin/ld: Undefined symbols:
>> _opal_hwloc122_hwloc_backend_sysfs_e
Le 14/12/2011 08:01, Paul H. Hargrove a écrit :
> I cannot even *build* OpenMPI on {Free,Open,Net}BSD systems unless I
> configure with --without-hwloc.
> Thus I cannot agree w/ Brice's suggestion that I ignore this warning.
Please try building hwloc (1.2.2 if you want the same one as OMPI
current
Le 14/12/2011 07:12, Paul H. Hargrove a écrit :
> I cannot hwloc in build 1.5.5rc1 on the following system:
>
> System 2: Linux/x86
>> $ cat /etc/redhat-release
>> Red Hat Linux release 8.0 (Psyche)
>> $ uname -a
>> Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10 EDT 2009
>> i686 i6
Le 14/12/2011 07:17, Paul H. Hargrove a écrit :
> My OpenBSD and NetBSD testers have the same behavior, but now I see
> that I was at warned...
>
> On all the affected systems I found the following (modulo the system
> tuple) in the configure output:
>> checking which OS support to include... Unsup
I am playing with those aspects right now (it's planned for hwloc v1.4).
hwloc (even the 1.2 currently in OMPI) can already support topology
containing different machines, but there's no easy/automatic way to
agregate multiple machine topologies into a single global one. The
important thing to unde
Le 04/08/2011 02:24, Jeff Squyres a écrit :
> Libtool's -all-static flag probably resolves to some gcc flag(s), right? Can
> you just pass those in via CFLAGS / LDFLAGS to configure and then not pass
> anything in via make?
I only see an additional -static flag on the final program-link gcc
com
Le 03/08/2011 20:37, Jeff Squyres a écrit :
>
> Shouldn't you pass the same LDFLAGS to configure as to make?
I'd be happy if it worked :)
Actually, I'd be even more happy if Pavel didn't have to do this to
build a fully-static orted.
> I.e., if you tell configure "configure it way" but then yo
Le 03/08/2011 18:24, Shamis, Pavel a écrit :
> Hw-loc vanilla works, because static mode does not build the binaries
> in static mode. If you would try to build build hwloc utilities in
> static mode it fails , just like ompi.
I get static binaries on SLES11 with
./configure --enable-static --
Le 03/08/2011 16:47, Jeff Squyres a écrit :
> Err.. I don't quite understand. How exactly are you configuring? If I
> do this:
He's using contrib/platform/ornl/ornl_configure_self_contained
I reproduced here on SLES10 with
./configure --enable-static --disable-shared
--with-wrapper-ldflags=-s
I finally reproduced here. Based on the ornl platform script, you're
configuring with LDFLAGS=-static and then building with make
LDFLAGS=-all-static. Surprisingly, this works fine when building vanilla
hwloc, but it breaks inside OMPI. The reason is that OMPI doesn't pass
LDFLAGS=-static to hwloc'
Hello Pavel,
Do you have libnuma headers and dynamic lib installed without static lib
installed ? Which distro is this?
Brice
Le 25/07/2011 23:56, Shamis, Pavel a écrit :
> Hello,
>
> I have been trying to compile Open MPI (trunk) static version with hwloc, the
> last is enabled by default in t
hwloc (since 1.1, on Linux) can already tell you which CPUs are close to
a CUDA device, see
https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h
and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
Do you need anything else ?
Brice
Le 14/04/2011 17:44,
Le 14/04/2011 17:58, George Bosilca a écrit :
> On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
>
>
>> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
>> between the HCA and the GPU that took CPU cycles.
>>
> If this is the case then why do we need to use special
Hello Rolf,
This "CUDA device memory" isn't memory mapped in the host, right? Then
what does its address look like ? When you say "when it is detected that
a buffer is CUDA device memory", if the actual device and host address
spaces are different, how do you know that device addresses and usual
h
Le 08/09/2010 14:02, Jeff Squyres a écrit :
> On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:
>
>
>> However, going over the existing BTLs I can see that some BTLs do not
>> correctly set this value:
>>
>> BTL BandwidthAuto-detect Status
>> Elan2000NO
Le 03/09/2010 17:33, George Bosilca a écrit :
>>> GM 250 NO Doubtful
>>>
This one should be 2000 (assuming nobody runs Myrinet 1280 from the 90s
anymore :))
>>> MX 2000/1 YES (Mbs)Correct (before the patch)
>>> OFUD800
Le 03/09/2010 15:38, George Bosilca a écrit :
> Jeff,
>
> I think you will have to revert this patch as the btl_bandwidth __IS__
> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs
> (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the
> original design of
detection anyway?).
Signed-off-by: Brice Goglin
Index: ompi/mca/btl/mx/btl_mx_component.c
===
--- ompi/mca/btl/mx/btl_mx_component.c (révision 23711)
+++ ompi/mca/btl/mx/btl_mx_component.c (copie de travail)
@@ -15
Le 18/08/2010 19:21, Eugene Loh a écrit :
> Eugene Loh wrote:
>
>> In mca_btl_sm_get_sync(), I see this:
>> /* Use the DMA flag if knem supports it *and* the segment length
>>is greater than the cutoff. Note that if the knem_dma_min
>>value is 0 (i.e., the MCA param was set
Ashley Pittman wrote:
>> [csamuel@tango069 ~]$ ~/local/hwloc/0.9.1rc2/bin/lstopo
>> System(31GB)
>> Node#0(15GB) + Socket#0 + L3(6144KB) + L2(512KB) + L1(64KB) + Core#0 + P#0
>> Node#1(16GB) + Socket#1 + L3(6144KB)
>> L2(512KB) + L1(64KB) + Core#0 + P#4
>> L2(512KB) + L1(64KB) + Core#1
George Bosilca wrote:
> On Oct 21, 2009, at 13:42 , Scott Atchley wrote:
>> On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:
>>> Because MX doesn't provide a real RMA protocol, we created a fake
>>> one on top of point-to-point. The two peers have to agree on a
>>> unique tag, then the receiver p
Hello,
I am debugging a crash with OMPI 1.3.3 BTL over Open-MX. It's crashing
will trying to store incoming data in the OMPI receive buffer, but OMPI
seems to have already freed the buffer even if the MX request is not
complete yet. It looks like this is caused by mca_btl_mx_prepare_dst()
posting
Jeff Squyres wrote:
> Do you just want to wait for the ummunotify stuff in OMPI? I'm half
> done making a merged "linux" memory component (i.e., it merges the
> ptmalloc2 component with the new ummunotify stuff).
>
> It won't help for kernels <2.6.32, of course. :-)
Yeah that's another solution
Jeff Squyres wrote:
> On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote:
>
>> I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
>> cleanup the Open-MX regcache when needed. It causes some deadlocks since
>> OpenMPI intercepts Open-MX' own free() ca
Hello,
I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
cleanup the Open-MX regcache when needed. It causes some deadlocks since
OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to
have Open-MX free/munmap calls not invoke OpenMPI interception hooks? Or
is
George Bosilca wrote:
> Yes, in Open MPI the connections are usually created on demand. As far
> as I know there are few devices that do not abide to this "law", but
> MX is not one of them.
>
> To be more precise on how the connections are established, if we say
> that each node has two rails and
try to connect
> the second device (rail in this context). In MX this works because we
> use the blocking function (mx_connect).
>
> george.
>
> On Jun 17, 2009, at 08:23 , Brice Goglin wrote:
>
>> Hello,
>>
>> I am debugging some sort of deadlock when doin
1 - 100 of 103 matches
Mail list logo