Hello Rolf,
This "CUDA device memory" isn't memory mapped in the host, right? Then
what does its address look like ? When you say "when it is detected that
a buffer is CUDA device memory", if the actual device and host address
spaces are different, how do you know that device addresses and usual
h
Le 14/04/2011 17:58, George Bosilca a écrit :
> On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
>
>
>> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
>> between the HCA and the GPU that took CPU cycles.
>>
> If this is the case then why do we need to use special
hwloc (since 1.1, on Linux) can already tell you which CPUs are close to
a CUDA device, see
https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h
and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
Do you need anything else ?
Brice
Le 14/04/2011 17:44,
Hello Pavel,
Do you have libnuma headers and dynamic lib installed without static lib
installed ? Which distro is this?
Brice
Le 25/07/2011 23:56, Shamis, Pavel a écrit :
> Hello,
>
> I have been trying to compile Open MPI (trunk) static version with hwloc, the
> last is enabled by default in t
I finally reproduced here. Based on the ornl platform script, you're
configuring with LDFLAGS=-static and then building with make
LDFLAGS=-all-static. Surprisingly, this works fine when building vanilla
hwloc, but it breaks inside OMPI. The reason is that OMPI doesn't pass
LDFLAGS=-static to hwloc'
Le 03/08/2011 16:47, Jeff Squyres a écrit :
> Err.. I don't quite understand. How exactly are you configuring? If I
> do this:
He's using contrib/platform/ornl/ornl_configure_self_contained
I reproduced here on SLES10 with
./configure --enable-static --disable-shared
--with-wrapper-ldflags=-s
Le 03/08/2011 18:24, Shamis, Pavel a écrit :
> Hw-loc vanilla works, because static mode does not build the binaries
> in static mode. If you would try to build build hwloc utilities in
> static mode it fails , just like ompi.
I get static binaries on SLES11 with
./configure --enable-static --
Le 03/08/2011 20:37, Jeff Squyres a écrit :
>
> Shouldn't you pass the same LDFLAGS to configure as to make?
I'd be happy if it worked :)
Actually, I'd be even more happy if Pavel didn't have to do this to
build a fully-static orted.
> I.e., if you tell configure "configure it way" but then yo
Le 04/08/2011 02:24, Jeff Squyres a écrit :
> Libtool's -all-static flag probably resolves to some gcc flag(s), right? Can
> you just pass those in via CFLAGS / LDFLAGS to configure and then not pass
> anything in via make?
I only see an additional -static flag on the final program-link gcc
com
I am playing with those aspects right now (it's planned for hwloc v1.4).
hwloc (even the 1.2 currently in OMPI) can already support topology
containing different machines, but there's no easy/automatic way to
agregate multiple machine topologies into a single global one. The
important thing to unde
Le 14/12/2011 07:17, Paul H. Hargrove a écrit :
> My OpenBSD and NetBSD testers have the same behavior, but now I see
> that I was at warned...
>
> On all the affected systems I found the following (modulo the system
> tuple) in the configure output:
>> checking which OS support to include... Unsup
Le 14/12/2011 07:12, Paul H. Hargrove a écrit :
> I cannot hwloc in build 1.5.5rc1 on the following system:
>
> System 2: Linux/x86
>> $ cat /etc/redhat-release
>> Red Hat Linux release 8.0 (Psyche)
>> $ uname -a
>> Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10 EDT 2009
>> i686 i6
Le 14/12/2011 08:01, Paul H. Hargrove a écrit :
> I cannot even *build* OpenMPI on {Free,Open,Net}BSD systems unless I
> configure with --without-hwloc.
> Thus I cannot agree w/ Brice's suggestion that I ignore this warning.
Please try building hwloc (1.2.2 if you want the same one as OMPI
current
Le 14/12/2011 08:29, Paul H. Hargrove a écrit :
> I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the
> same hwloc issue I've encountered on {Free,Open,Net}BSD.
> The build fails with
>> CCLD opal_wrapper
>> /usr/bin/ld: Undefined symbols:
>> _opal_hwloc122_hwloc_backend_sysfs_e
And a hwloc problem with very old sched_setaffinity on redhat 8, we're
looking at it.
Brice
Le 14/12/2011 11:14, Paul H. Hargrove a écrit :
> Summary of my 1.5.5rc1 testing findings:
>
> + generated config.h in tarball breaks hwloc on non-linux platforms:
> http://www.open-mpi.org/community/list
By default, hwloc only shows what's inside the current cpuset. There's
an option to show everything instead (topology flag).
Brice
Le 09/02/2012 12:18, Jeff Squyres a écrit :
> Just so that I understand this better -- if a process is bound in a cpuset,
> will tools like hwloc's lstopo only sho
Jeff Squyres a écrit :
>On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote:
>
>>> Just so that I understand this better -- if a process is bound in a
>>> cpuset, will tools like hwloc's lstopo only show the Linux
>>> processors *in that cpuset*? I.e., does it not have any
>>> visibility of the pr
The bios usually tells you which numa location is close to each host-to-pci
bridge. So the answer is yes.
Brice
Ralph Castain a écrit :
I'm not sure I understand this comment. A PCI device is attached to the node,
not to any specific location within the node, isn't it? Can you really say that
et isn't
NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj() to do that.
Brice
Le 09/02/2012 14:34, Ralph Castain a écrit :
> Ah, okay - in that case, having the I/O device attached to the
> "closest" object at each depth would be ideal from an OMPI perspective.
>
> On Feb 9, 20
Le 09/02/2012 14:00, Ralph Castain a écrit :
> There is another aspect, though - I had missed it in the thread, but the
> question Nadia was addressing is: how to tell I am bound? The way we
> currently do it is to compare our cpuset against the local cpuset - if we are
> on a subset, then we kn
IO
> device of a certain type. We wound up having to write "summarizer"
> code to parse the hwloc tree into a more OMPI-usable form, so we can
> always do that with the IO tree as well if necessary.
>
>
> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>
>> That d
Le 16/02/2012 15:39, Matthias Jurenz a écrit :
> Here the output of lstopo from a single compute node. I'm wondering that the
> fact of L1/L2 sharing isn't visible - also not in the graphical output...
That's a kernel bug. We're waiting for AMD to tell the kernel that L1i
and L2 are shared across
c-bind pu:0 ./all2all : -np 1 hwloc-bind pu:1
./all2all
Then, we'll see if you can get the same result with one of OMPI binding
options.
Brice
> Matthias
>
> On Thursday 16 February 2012 15:46:46 Brice Goglin wrote:
>> Le 16/02/2012 15:39, Matthias Jurenz a écrit :
>&g
Le 16/02/2012 14:16, nadia.der...@bull.net a écrit :
> Hi Jeff,
>
> Sorry for the delay, but my victim with 2 ib devices had been stolen ;-)
>
> So, I ported the patch on the v1.5 branch and finally could test it.
>
> Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had
> to set
>
Le 17/02/2012 14:59, Jeff Squyres a écrit :
> On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote:
>
>>> I didn't follow this entire thread in details, but I am feeling that
>>> something is wrong here. The flag fixes your problem indeed, but I think it
>>> may break binding too. It's basically maki
Le 22/02/2012 07:36, Eugene Loh a écrit :
> On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:
>> Here are the first of the results of the testing I promised.
>> I am not 100% sure how to reach the code that Eugene reported as
>> problematic,
> I don't think you're going to see it. Somehow, hwloc on th
rm, hwloc finds no socket level
>>> *) therefore hwloc returns num_sockets==0 to OMPI
>>> *) OMPI divides by 0 and barfs on basically everything
>> Okay. So, Brice's other e-mail indicates that the first two are "not really
>> uncommon":
>>
Le 22/02/2012 20:24, Eugene Loh a écrit :
> On 2/22/2012 11:08 AM, Ralph Castain wrote:
>> On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote:
>>> Le 22/02/2012 17:48, Ralph Castain a écrit :
>>>> On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote
>>>>
Le 02/05/2012 15:00, Jeff Squyres a écrit :
> Here's what I've put for the 1.6 NEWS bullets -- do they look ok?
>
> - Fix some process affinity issues. When binding a process, Open MPI
> will now bind to all available hyperthreads in a core (or socket,
> depending on the binding options specif
Your /proc/cpuinfo output (filtered below) looks like only two sockets
(physical ids 0 and 1), with one core each (cpu cores=1, core id=0),
with hyperthreading (siblings=2). So lstopo looks good.
E5-2650 is supposed to have 8 cores. I assume you use Linux
cgroups/cpusets to restrict the available c
will bind to all the HT's in a core and/or socket.
>
> Are you using Linux cgroups/cpusets to restrict available cores?
> Because Brice is saying that E5-2650 is supposed to have more cores.
>
>
> > On Wed, May 30, 2012 at 4:36 PM, Brice Goglin
> mailto:
I don't see any git-svn line in the commit messages, those are very
helpful when one want to see a specific commits using its SVN revision.
Brice
Le 04/10/2012 15:41, Jeff Squyres a écrit :
> It would probably be better to ask one of the other git-interested people.
>
> Bert? Brice? Nathan?
>
Le 05/10/2012 10:35, Bert Wesarg a écrit :
> On 10/04/2012 03:41 PM, Jeff Squyres wrote:
>> It would probably be better to ask one of the other git-interested people.
>>
>> Bert? Brice? Nathan?
>>
>> Can you check that the git mirror appears to be functioning properly?
> Just tried it and bootstr
Le 25/10/2012 23:56, Barrett, Brian W a écrit :
> Hi all -
>
> The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my
> mpool change in r27485). I'm not really interested in fixing it; the
> problem does not occur with the MX MTL. Does anyone else have interest in
> fixing it?
Do you already use hwloc's PCI objects in OMPI v1.7 ?
Brice
Le 06/02/2013 15:39, Jeff Squyres (jsquyres) a écrit :
> BEFORE YOU PANIC: this only affects Open MPI v1.7 (which is not yet released)
> and the OMPI SVN trunk (which is also, obviously, not released). ***OMPI
> v1.6.x is unaffecte
Le 03/05/2013 02:47, Ralph Castain a écrit :
> Brice: do the Phis appear in the hwloc topology object?
Yes, on Linux, you will see something like this in lstopo v1.7:
HostBridge L#0
PCIBridge
PCI 8086:225c
CoProc L#2 "mic0"
And these contain some attributes saying how many c
Le 09/07/2013 00:32, Jeff Squyres (jsquyres) a écrit :
> INRIA
>
> bgoglin: Brice Goglin
> arougier: Antoine Rougier
> sthibaul: Samuel Thibault
> mercier: Guillaume Mercier **NO COMMITS IN LAST YEAR**
> nfurmento:Nathalie Furmento **NO COMMITS IN LAST
> Y
http://anonscm.debian.org/viewvc/pkg-openmpi/openmpi/
svn://svn.debian.org/svn/pkg-openmpi/openmpi/
FWIW, hwloc debian packaging is maintained by one of the upstream devs,
but he didn't have to pollute the upstream hwloc repo with debian stuff.
There's a different repo with only the debian subdire
OFED is already in Debian as far as I know. At least Roland Dreier takes
care of uploading some IB-related packages. And I didn't have any
problem getting Mellanox IB to work on Debian in the last years, but I
haven't played with Mellanox custom APIs.
Brice
Le 07/11/2013 20:27, Mike Dubman a éc
Hello,
We're setting up a new cluster here. Open MPI 1.7.4 was hanging at
startup without any error message. The issue appears to be
udcm_component_query() hanging in finalize() on the sched_yield() loop
when memlock limit isn't set to unlimited as usual.
Unfortunately the hangs occur before we p
Hello Ralph,
I took care of the defects under opal/mca/hwloc/hwloc172. Nothing
important there (a memory leak in some deprecated code that is likely
unused today). But I also updated hwloc's v1.7 branch with all recent
fixes from more recent branches. You may want to update OMPI's copy.
At least y
Not sure about the details either, but ppc64le support was only included
in libtool recently (will be in the next release). I guess ppcle support
is only becoming a reality now, and it wasn't widely usable in the past.
Brice
Le 28/04/2014 17:17, George Bosilca a écrit :
> I’m not sure how to in
to get more details from OMPI when it fails
to load a component because of missing symbols like this?
LD_DEBUG=verbose isn't very convenient :)
thanks,
Brice Goglin
27;t configure check for mx__regcache_clean as well?
> mca_component_show_load_errors is what you need there. Set it to
> something high depending on the level of verbosity you want to have.
I am still getting "file not found". It may be related to Jeff's libtool
bug. LD_DEBUG=verbose will be enough for now.
Thanks.
Brice Goglin
Hello,
I am debugging some sort of deadlock when doing multirail over Open-MX.
What I am seeing with 2 processes and 2 boards per node with *MX* is:
1) process 0 rail 0 connects to process 1 rail 0
2) p1r0 connects back to p0r0
3) p0 rail 1 connects to p1 rail 1
4) p1r1 connects back to p0r1
For s
try to connect
> the second device (rail in this context). In MX this works because we
> use the blocking function (mx_connect).
>
> george.
>
> On Jun 17, 2009, at 08:23 , Brice Goglin wrote:
>
>> Hello,
>>
>> I am debugging some sort of deadlock when doin
George Bosilca wrote:
> Yes, in Open MPI the connections are usually created on demand. As far
> as I know there are few devices that do not abide to this "law", but
> MX is not one of them.
>
> To be more precise on how the connections are established, if we say
> that each node has two rails and
Hello,
I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
cleanup the Open-MX regcache when needed. It causes some deadlocks since
OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to
have Open-MX free/munmap calls not invoke OpenMPI interception hooks? Or
is
Jeff Squyres wrote:
> On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote:
>
>> I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
>> cleanup the Open-MX regcache when needed. It causes some deadlocks since
>> OpenMPI intercepts Open-MX' own free() ca
Jeff Squyres wrote:
> Do you just want to wait for the ummunotify stuff in OMPI? I'm half
> done making a merged "linux" memory component (i.e., it merges the
> ptmalloc2 component with the new ummunotify stuff).
>
> It won't help for kernels <2.6.32, of course. :-)
Yeah that's another solution
Hello,
I am debugging a crash with OMPI 1.3.3 BTL over Open-MX. It's crashing
will trying to store incoming data in the OMPI receive buffer, but OMPI
seems to have already freed the buffer even if the MX request is not
complete yet. It looks like this is caused by mca_btl_mx_prepare_dst()
posting
George Bosilca wrote:
> On Oct 21, 2009, at 13:42 , Scott Atchley wrote:
>> On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:
>>> Because MX doesn't provide a real RMA protocol, we created a fake
>>> one on top of point-to-point. The two peers have to agree on a
>>> unique tag, then the receiver p
Ashley Pittman wrote:
>> [csamuel@tango069 ~]$ ~/local/hwloc/0.9.1rc2/bin/lstopo
>> System(31GB)
>> Node#0(15GB) + Socket#0 + L3(6144KB) + L2(512KB) + L1(64KB) + Core#0 + P#0
>> Node#1(16GB) + Socket#1 + L3(6144KB)
>> L2(512KB) + L1(64KB) + Core#0 + P#4
>> L2(512KB) + L1(64KB) + Core#1
Le 18/08/2010 19:21, Eugene Loh a écrit :
> Eugene Loh wrote:
>
>> In mca_btl_sm_get_sync(), I see this:
>> /* Use the DMA flag if knem supports it *and* the segment length
>>is greater than the cutoff. Note that if the knem_dma_min
>>value is 0 (i.e., the MCA param was set
detection anyway?).
Signed-off-by: Brice Goglin
Index: ompi/mca/btl/mx/btl_mx_component.c
===
--- ompi/mca/btl/mx/btl_mx_component.c (révision 23711)
+++ ompi/mca/btl/mx/btl_mx_component.c (copie de travail)
@@ -15
Le 03/09/2010 15:38, George Bosilca a écrit :
> Jeff,
>
> I think you will have to revert this patch as the btl_bandwidth __IS__
> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs
> (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the
> original design of
Le 03/09/2010 17:33, George Bosilca a écrit :
>>> GM 250 NO Doubtful
>>>
This one should be 2000 (assuming nobody runs Myrinet 1280 from the 90s
anymore :))
>>> MX 2000/1 YES (Mbs)Correct (before the patch)
>>> OFUD800
Le 08/09/2010 14:02, Jeff Squyres a écrit :
> On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:
>
>
>> However, going over the existing BTLs I can see that some BTLs do not
>> correctly set this value:
>>
>> BTL BandwidthAuto-detect Status
>> Elan2000NO
It comes from the hwloc API. It doesn't use integers because some users
want to provide their own distance matrix that was generated by
benchmarks. Also we normalize the matrix to have latency 1 on the
diagonal (for local memory access latency ) and that causes non-diagonal
items not to be integers
Should be fixed by
https://github.com/open-mpi/hwloc/commit/9549fd59af04dca2e2340e17f0e685f8c552d818
Thanks for the report
Brice
Le 02/05/2016 21:53, Paul Hargrove a écrit :
> I have a linux/ppc64 host running Fedora 20.
> I have configured the 2.0.0rc2 tarball with
>
> --prefix=[] --ena
> 1.11.2, and into v2.x in particular?
> Or perhaps that is Jeff's job?
>
> -Paul
>
> On Mon, May 2, 2016 at 11:04 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Should be fixed by
>
> https://github.com/open-mpi/hwloc/commit/9549f
https://github.com/open-mpi/ompi/pull/1621 (against master, needs to go
to 2.0 later)
Le 03/05/2016 08:22, Brice Goglin a écrit :
> Yes we should backport this to OMPI master and v2.x.
> I am usually not the one doing the PR, I'd need to learn the exact
> procedure first :)
>
&
Thanks
I think I would be fine with that fix. Unfortunately I won't have a good
internet access until sunday night. I won't be able to test anything
properly earlier :/
Le 06/05/2016 00:29, Paul Hargrove a écrit :
> I have some good news: I have a fix!!
>
> FWIW: I too can build w/ xlc 12.1 (al
Thanks, applied to hwloc. And PR for OMPI master at
https://github.com/open-mpi/ompi/pull/1657
Brice
Le 06/05/2016 00:29, Paul Hargrove a écrit :
> I have some good news: I have a fix!!
>
> FWIW: I too can build w/ xlc 12.1 (also BG/Q).
> It is just the 13.1.0 on Power7 that crashes building hw
Yes, kill all netloc lists.
Brice
Le 18 juillet 2016 17:43:49 UTC+02:00, Josh Hursey a
écrit :
>Now that netloc has rolled into hwloc, I think it is safe to kill the
>netloc lists.
>
>mtt-devel-core and mtt-annouce should be kept. They probably need to be
>cleaned. But the hope is that we relea
s/June 2016/June 2006/ :)
Anyway, it ended on July 31st based on https://www.suse.com/lifecycle/
Brice
Le 29/08/2016 16:03, Gilles Gouaillardet a écrit :
> According to wikipedia, SLES 10 was released on June 2016, and is
> supported for 10 years.
> (SLES 12 is supported for 13 years, and I ho
Le 05/01/2017 07:07, Gilles Gouaillardet a écrit :
> Brice,
>
> things would be much easier if there were an HWLOC_OBJ_NODE object in
> the topology.
>
> could you please consider backporting the relevant changes from master
> into the v1.11 branch ?
>
> Cheers,
>
> Gilles
Hello
Unfortunately, I
Hello
Did anybody start porting OMPI to the new hwloc 2.0 API (currently in
hwloc git master)?
Gilles, I seem to remember you were interested a while ago?
I will have to do it in the near future. If anybody already started that
work, please let me know.
Brice
___
You can email scan-ad...@coverity.com to report bugs and/or ask what's
going on.
Brice
Le 16/06/2017 07:12, Gilles Gouaillardet a écrit :
> Ralph,
>
>
> my 0.02 US$
>
>
> i noted the error message mentions 'holding lock
> "pmix_mutex_t.m_lock_pthread"', but it does not explicitly mentions
>
> '
Hello
This message is related to /var/cache/hwloc/knl_memoryside_cache. This
file exposes the KNL cluster and MCDRAM configuration, only accessible
from root-only files. hwloc-dump-hwdata runs at boot time to create that
file, and non-root hwloc users can read it later. Failing to read that
file b
Looks like you're using a hwloc < 1.11. If you want to support this old
API while using the 1.11 names, you can add this to OMPI after #include
#if HWLOC_API_VERSION < 0x00010b00
#define HWLOC_OBJ_NUMANODE HWLOC_OBJ_NODE
#define HWLOC_OBJ_PACKAGE HWLOC_OBJ_SOCKET
#endif
Brice
Le 04/10/2017 19
Le 20/12/2017 à 22:01, Howard Pritchard a écrit :
>
> I can think of several ways to fix it. Easiest would be to modify the
>
> opal/mca/hwloc/hwloc2a/configure.m4
>
> to not set --enable-cuda if --with-cuda is evaluated to something
> other than yes.
>
>
> Optionally, I could fix the hwloc config
Hello
Two hwloc issues are listed in this week telcon:
"hwloc2 WIP, may need help with."
https://github.com/open-mpi/ompi/pull/4677
* Is this really a 3.0.1 thing? I thought hwloc2 was only for 3.1+
* As I replied in this PR, I have some patches but I need help for
testing them. Can you list some
Sorry guys, I think I have all patches ready since the F2F meeting, but
I couldn't test them enough because ranking was broken. I'll work on
that by next week.
Brie
Le 22/05/2018 à 17:50, r...@open-mpi.org a écrit :
> I’ve been running with hwloc 2.0.1 for quite some time now without problem,
I just pushed my patches rebased on master + update to hwloc 2.0.1 to
bgoglin/ompi (master branch).
My testing of mapping/ranking/binding looks good here (on dual xeon with
CoD, 2 sockets x 2 NUMA x 6 cores).
It'd be nice if somebody else could test on another platform with
different options and/
Hello Jeff
Looks like I am not allowed to modify the page but I'll be at the meeting ;)
Brice
Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit :
> Gentle reminder to please sign up for the face-to-face meeting and add your
> items to the wiki:
>
> https://github.com/open-m
Gilles,
The strange configure check comes from this commit
https://github.com/open-mpi/hwloc/commit/6a9299ce9d1cb1c13b3b346fe6fdfed2df75c672
Are you sure your patch won't break something else?
I'll ask Pavan what he thinks about your patch.
I agree that it's crazy we don't find strncasecmp on som
Hello
The github issue you're refering to was closed 18 months ago. The
warning (it's not an error) is only supposed to appear if you're
importing in a recent hwloc a XML that was exported from a old hwloc. I
don't see how that could happen when using Open MPI since the hwloc
versions on both sides
>> about 9/10 cases but works without warnings in 1/10 cases. I attached the
>>> output (with xml) for both the working and `broken` case. Note that the xml
>>> is of course printed (differently) multiple times for each task/core. As
>>> always, any help would be
a big change. OMPI 1.8 series is based on
>>>>>> hwloc 1.9, so at least that is closer (though probably still a mismatch).
>>>>>>
>>>>>> Frankly, I’d just download and install an OMPI tarball myself and avoid
>>>>>> these heada
ldn’t leaf thru your output well enough to see all the lstopo
>>>>>>> versions, but you might check to ensure they are the same.
>>>>>>>
>>>>>>> Looking at the code base, you may also hit a problem here. OMPI 1.6
>>>>>&g
ld this fix be backported to both master and v1.8 ?
>
> Cheers,
>
> Gilles
>
> On 2014/12/12 7:46, Brice Goglin wrote:
>> This problem was fixed in hwloc upstream recently.
>>
>> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e
>
Le 12/12/2014 07:36, Gilles Gouaillardet a écrit :
> Brice,
>
> ompi master is based on hwloc 1.9.1, isn't it ?
Yes sorry, I am often confused by all these OMPI vs hwloc branch numbers.
>
> if some backport is required for hwloc 1.7.2 (used by ompi v1.8), then
> could you please update the hwloc
Le 17/12/2014 21:43, Paul Hargrove a écrit :
>
> Dbx gives me
>
> t@1 (l@1) terminated by signal SEGV (no mapping at the fault address)
> Current function is opal_hwloc172_hwloc_get_obj_by_depth
>74 return topology->levels[depth][idx];
> (dbx) where
> current thread: t@1
Le 24/03/2015 20:47, Jeff Squyres (jsquyres) a écrit :
> I talked to Peter off-list.
>
> We got a successful build going for him.
>
> Seems like we've identified a few issues here, though:
>
> 1. ./configure with gcc 4.7.2 on Debian (I didn't catch the precise version
> of Debian) results in a Lhw
It was renamed from cpuid.h to cpuid-x86.h at some point. Can't check from here
but the actual code should be the same in all these branches.
Brice
Le 31 juillet 2015 22:19:47 UTC+02:00, Ralph Castain a
écrit :
>Yo Paul
>
>1.8.8 and 1.10 do not have hwloc-1.11 in them - they remain on
>hwloc-1
Le 25/08/2015 05:59, Christopher Samuel a écrit :
>
> INRIA does have Open-MX (Myrinet Express over Generic Ethernet
> Hardware), last release December 2014. No idea if it's still developed
> or used..
>
> http://open-mx.gforge.inria.fr/
>
> Brice?
>
> Open-MPI is listed as working with it there.
The locality is mlx4_0 as reported by lstopo is "near the entire
machine" (while mlx4_1 is reported near NUMA node #3). I would vote for
buggy PCI-NUMA affinity being reported by the BIOS. But I am not very
familiar with 4x E5-4600 machines so please make sure this PCI slot is
really attached to a
as a
> floating point number ?
>
> i remember i had to fix a bug in ompi a while ago
> /* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */
>
> Cheers,
>
> Gilles
>
> On 9/1/2015 5:28 AM, Brice Goglin wrote:
>> The locality is mlx4_0 as reported by l
Le 01/09/2015 15:59, marcin.krotkiewski a écrit :
> Dear Rolf and Brice,
>
> Thank you very much for your help. I have now moved the 'dubious' IB
> card from Slot 1 to Slot 5. It is now reported by hwloc as bound to a
> separate NUMA node. In this case OpenMPI works as could be expected:
>
> - NUM
Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
> Ralph,
>
> just to be clear, your proposal is to abort if openmpi is configured
> with --without-hwloc, right ?
> ( the --with-hwloc option is not removed because we want to keep the
> option of using an external hwloc library )
>
> if I understa
Did it work on the same machine before? Or did OMPI enable hwloc's PCI
discovery recently?
Does lstopo complain the same?
Brice
Le 10/09/2015 21:10, George Bosilca a écrit :
> With the current trunk version I keep getting an assert deep down in
> orted.
>
> orted:
> ../../../../../../../ompi/op
; lstopo complains with the same assert. Interestingly enough, the same
> binary succeed on the other nodes of the same cluster ...
>
> George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Did it work on the sa
ther nodes of the same cluster ...
>
> George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Did it work on the same machine before? Or did OMPI enable hwloc's
> PCI discovery recently?
>
> Does lstop
Sorry, I didn't see this report before the pull request.
I applied Gilles' "simple but arguable" fix to master and stable
branches up to v1.9. It could be too imperfect if somebody ever changes
to permissions of /devices/pci* but I guess that's not going to happen
in practice. Finding the right de
Hello
hwloc doesn't have any cuda specific configure variables. We just use
standard variables like LIBS and CPPFLAGS. I guess OMPI could propagate
--with-cuda directories to hwloc by setting LIBS and CPPFLAGS before
running hwloc m4 functions, but I don't think OMPI actually cares about
hwloc repo
Thanks a lot for writing all this.
At the end
https://github.com/open-mpi/ompi/wiki/GitSubmodules#adding-a-new-submodule-pointing-to-a-specific-commit
should "bar" be "bar50x" in line "$ git add bar" ?
It seems to me that you are in opal/mca/foo and the new submodule is in
"bar50x" (according to
Hello
I have a git submodule issue that I don't understand.
PR#7367 was initially on top of PR #7366. When Jeff merged PR#7366, I
rebased my #7367 with git prrs and got this error:
$ git prrs origin master
>From https://github.com/open-mpi/ompi
* branch master -> FETCH_HEAD
FYI, this was a git bug that will be fixed soon (the range of commits
being rebased was wrong).
https://lore.kernel.org/git/pull.789.git.1605314085.gitgitgad...@gmail.com/T/#t
https://lore.kernel.org/git/20d6104d-ca02-4ce4-a1c0-2f9386ded...@gmail.com/T/#t
Brice
Le 07/02/2020 à 10:27, Brice
Hello Ralph
One thing that isn't clear in this document : the hwloc shmem region may
only be mapped *once* per process (because the mmap address is always
the same). Hence, if a library calls adopt() in the process, others will
fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC
top
1 - 100 of 103 matches
Mail list logo