mca_pml_ucx_close
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:178 mca_pml_ucx_close
Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
Office: (734)936-1985 (not in use during Covid)
Cell: (989)277-6075
Thanks,
yeah I was looking for an API that would take into consideration most
cases, like I find with hwloc-bind --get where I can find the number the
process has access to. Wether is cgroups, other sorts of affinity setting
etc.
Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director
do allow the user to specify number of threads, but would like to
automate it for least astonishment.
Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
(734)936-1985
On Mon, Aug 31, 2020 at 11:34 AM Guy Streeter
wrote:
> My v
Hello,
I have a small utility, it is currently using multiprocess.cpu_count()
Which currently ignores cgroups etc.
I see https://gitlab.com/guystreeter/python-hwloc
But appears stale,
How would you detect number of threads that are safe to start in a cgroup
from Python3 ?
Thanks!
Brock
there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.
Thanks,
Brock Palen
www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
Director Advanced Research Computing - TS
XSEDE Campus Champi
. does it work with gcc compilers ?)
>
> Cheers,
>
> Gilles
>
>
> On 4/4/2016 11:28 AM, Brock Palen wrote:
>
> I recently needed to build an OpenMPI build on Power 8 where I had access
> to xlc / xlf
>
> The current release fails (as noted in the readme)
> But a clone
!
--
Brock Palen
www.umich.edu/~brockp
Assoc. Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
I'll hit you off list with my Abinit OpenMPI build notes,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Feb 3, 2015, at 2:26 PM, Nick Papior Andersen <nickpap...@gmail.com> wrote:
>
> I also concur with Je
://citutor.org/login.php
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Dec 19, 2014, at 5:56 PM, Diego Avesani <diego.aves...@gmail.com> wrote:
>
> dear all users,
> I am new in MPI world.
> I wo
(32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#2)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#3)
HostBridge L#0
PCI 8086:7010
PCI 1013:00b8
PCI 8086:10ed
Net L#0 "eth0"
Brock Palen
www.umich.edu/~brockp
CAE
Are you doing this just for debugging? Or you really want to do it within the
MPI program?
orte-ps
Gives you the pid/host for each rank, but I don't think there is any standard
way to do this via API.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro
the 1gig interfaces but
yet data is being sent out the 10gig eoib0/ib0 interfaces.
I'll go do some measurements and see.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) <
Right I understand those are TCP interfaces, I was just showing that I have two
TCP interfaces over one physical interface, so why I was asking how TCP
interfaces were selected. It rarely if ever will mater to us.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
PI figure out that it can also talk over the others? How does it
chose to load balance?
BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and
eoib0 are the same physical device and may screw with load balancing if anyone
ver falls back to TCP.
Brock Palen
www.umich.ed
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 31, 2014, at 2:22 PM, Ralph Castain <rhc.open...@gmail.com> wrote:
>
>
>> On Oct 30, 2014, at 3:15 PM, Brock Palen <bro...@umich.e
B fabric,
this can happen. Multiple times now when just plugging in line cards to
switches on a live system causes large swaths of jobs to die with this error.
Does anyone else have this problem? We are a Mellonox based fabric.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE
source
don't they?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
Thanks this is good feedback.
I was worried with the dynamic nature of Yarn containers that it would be hard
to coordinate wire up, and you have confirmed that.
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct
I think a lot of the information on this page:
http://www.open-mpi.org/faq/?category=java
Is out of date with the 1.8 release.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html
Which appears to imply extra setup required. Is this documented anywhere for
OpenMPI?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
Doing special files on NFS can be weird, try the other /tmp/ locations:
/var/tmp/
/dev/shm (ram disk careful!)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
> On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gma
the only 2 processes
come from.
I checked some of the other jobs and the cpusets and the pbs server cpu list
are the same.
More investigation required. Still strange why would it give that message at
all? Why would OpenMPI care, and why only when -np ## is given.
Brock Palen
www.umich.ed
Yes the request to torque was procs=64,
We are using cpusets.
the mpirun without -np 64 creates 64 spawned hostnames.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.
I do wrong? I'm stumped why one works one doesn't but the one that
doesn't if your force it appears correct.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Description: Message signed with OpenPGP using GPGMail
Original Message-
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen
>> Sent: Friday, September 05, 2014 5:22 PM
>> To: Open MPI Users
>> Subject: [OMPI users] enable-cuda with disable-dlopen
>>
>> * PGP Signed by an unknown key
>&
=$CUDA \
--with-hwloc=internal \
--with-verbs \
--with-psm \
--with-tm=/usr/local/torque \
--with-fca=$FCA \
--with-mxm=$MXM \
--with-knem=$KNEM \
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
$COMPILERS
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE
Interesting, we are using 3.0 that is in MOFED, and that is also what is on the
MXM download site. Kinda confusing.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 28, 2014, at 2:12 AM, Mike Dubman <
Brice, et al.
Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with
knem and mxm 3.0,
If we have questions we will let you know.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 27, 2014, at 12:44 PM
trying to understand why that is the default.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 27, 2014, at 10:15 AM, Alina Sklarevich <ali...@dev.mellanox.co.il>
wrote:
> Hi,
>
> KNEM can improve
it a little. Should we investigate adding it to our systems?
Is there a way to suppress this warning?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Description: Message signed with OpenPGP using GPGMail
performance, but i'm
not sure what todo about it? There is nothing on the list, but there was one
reference to another MPI library. Is there any idea what would cause this?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
provide).
One thought is to have the data collector processes be threads inside the MPI
job running across all nodes, but was curious is there is a way to pass data
still in memory (to much to hit disk) to the running MPI filter job.
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced
;). If set to a non-default
value, it is mutually exclusive with
btl_tcp_if_include.
[brockp@flux-login1 34241]$
ompi_info --param all all --level 9
(gives me what I expect).
Thanks,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campu
"eth0,192.168.0.0/16"). If set to a non-default
value, it is mutually exclusive with
btl_tcp_if_include.
This is normally much longer. And yes we don't have the PHI stuff installed on
all nodes, strange that 'all all' is now very short, omp
Perfection! That appears to do it for our standard case.
Now I know how to set MCA options by env var or config file. How can I make
this the default, that then a user can override?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
as on each host?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 20, 2014, at 12:38 PM, Brock Palen <bro...@umich.edu> wrote:
> I was able to produce it in my test.
>
> orted affinity set by cpuset:
> [root@
:68072] MCW rank 28 is not bound (or bound to all
available processors)
[nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all
available processors)
[nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all
available processors)
Brock Palen
www.umich.edu
Got it,
I have the input from the user and am testing it out.
It probably has less todo with torque and more cpuset's,
I'm working on producing it myself also.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 20, 2014
In this case they are a single socket, but as you can see they could be
ether/or depending on the job.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
&
odes.
That is good to know, I think we would want to turn our default to 'bind to
core' except for our few users who use hybrid mode.
Our CPU set tells you what cores the job is assigned. So in the problem case
provided, the cpuset/cgroup shows only cores 8-11 are available to this job on
this
0x0f00
8,9,10,11
Which is exactly what I would expect.
So ummm, i'm lost why this might happen? What else should I check? Like I
said not all jobs show this behavior.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Ok I have dug into this more. Is this PMI the Slurm process manager?
To use OpenMPI on Phi just build OPenMPI for it? Does that mean I need to add
CFLAGS FCFLAGS -mmic ?
How does one go about doing multi-phi MPI code?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus
mpiifort and mpiicc are intel MPI library commands, in openmpi and others the
analogous would be mpifort and mpicc
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On May 27, 2014, at 2:11 PM, Lorenzo Donà <lorechimic...@hotmail
/
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On May 23, 2014, at 9:19 AM, Albert Solernou <albert.soler...@oerc.ox.ac.uk>
wrote:
> Hi,
> after compiling and installing OpenMPI 1.8.1, I find that OpenMPI is pinning
>
to the MPSS stack and this Phi stuff is very infantile at the moment
so minimal (decent) documentation, does anyone know what current package
provides PMI for the Xeon Phi?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
e descriptors to shared receive queue 2 (0 from 105)
[nyx5641.engin.umich.edu:30080] 4868 more processes have sent help message
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 557 more processes have sent help message
help-mpi-btl-openib.txt / mem-reg-fail
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
signature.asc
Description: Message signed with OpenPGP using GPGMail
On Feb 7, 2014, at 9:45 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
> Le 06/02/2014 21:31, Brock Palen a écrit :
>> Actually that did turn out to help. The nvml# devices appear to be numbered
>> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices
of the packages:
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-4_0_42.txt
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jan 31, 2014, at 9:06 AM, Mike Dubman <
e cuda and nvml devices in order.
I dont' know if some value are deterministic though. Could I ignore the
CoProc line and just use the:
GPU L#3 "nvml2"
GPU L#5 "nvml3"
GPU L#7 "nvml0"
GPU L#9 "nvml1&
@IBMAD_1.3'
libibmad is installed, but the symbol smp_mkey_set is not defined in it.
IBMAD_1.3 is though.
Any thought what may cause this? As far as I know our MOFED is from Mellnox
and should match up fine to their release of FCA. So this has me scratching my
head.
Thanks
Brock Palen
BAH,
The error persisted when doing the test to /tmp/ (local disk)
I rebuilt the library with the same compiler and all is well now.
Sorry for the false alarm. Thanks for the help and ideas Jeff.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
I never saw any replies on this. Has anyone else been able to produce this
sort of error? It is 100% reproducible for me.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Jan 9, 2014, at 11:46 AM, Brock Palen <bro...@umich.
--with-psm \
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
--with-mxm=$MXM \
--with-fca=$FCA \
--disable-dlopen --enable-shared \
$COMPILERS
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
That would do it.
Thanks!
Now to make even the normal ones work
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 3, 2013, at 10:31 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Looking at the source code, it is because th
" (current value:
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallv" (current value:
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallw" (current value:
<0>, data source: default value)
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:
> or do i just need to compile two versions, one with IB and one without?
You should not need to, we have OMPI compiled for openib/psm and run that same
install on psm/tcp and verbs(openib) based gear.
All the nodes assigned to your job
problems within the fabric;
please contact your system administrator.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
>> You sound like our vendors, "what is your app"
>
> ;-) I used to be one.
>
> Ideally OMPI should do the switch between MXM/RC/XRC internally in the
> transport layer. Unfortunately,
> we don't have such smart selection logic. Hopefully IB vendors will fix some
> day.
I actually looked
at the FAQ page it states
that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank
counts of any size.
I think we will do some testing, we never even heard of MXM before,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 22, 2
w00t :-)
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote:
> HmmmI'll see what I can do about the error message. I don't think there
> is much in 1.6 I can do, but in 1.7 I could ge
was looking for a node that had a bad
socket or wrong part.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote:
> I'm afraid these are both known problems in the 1.6.2 release. I believe we
> fixed nper
processors, where N >
M). Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.
You job will now abort.
--
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Comput
Thanks!
So it looks like most OpenMPI builds out there are running with ROMIO's that
are obvious to any optimizations to what they are running on.
I have added this to our build notes so we get it in next time. Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro
ave Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So that
list should be good for our site.
Would this be a good recommendation for us to include in all our MPI builds?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 3, 2012, at 7:1
was built with when I built
it?
Can I make ROMIO go into 'verbose' mode and have it print what it is setting
all its values to?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
of the code does use close to 12 cores as expected.
If I cervumvent out batch system and the cgroups a normal mpirun ./stream does
start 12 processes that consume a full 100% core.
Thoughts? This is really odd linux scheduler behavior.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
Thanks and super cool.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote:
> On Aug 24, 2012, at 10:45 AM, Brock Palen wrote:
>
>>> Right now we should be just warning if we can't regis
On Aug 24, 2012, at 10:38 AM, Jeff Squyres wrote:
> On Aug 24, 2012, at 10:28 AM, Brock Palen wrote:
>
>> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with
>> 1.6.0 with low registered memory. From reading the release notes rather
>>
is for MPI to blow up saying "can't allocate
registered memory, fatal, contact your admin", rather than fall back to
send/receive and just be slower.
Am I reading the release notes correctly? Is there a tunable setting to blow
up rather than fallback?
Brock Palen
www.umich.edu/~brockp
CAE
Yep very odd,
Looks like torque wrote a wrapper then for some hwloc functions.
BTW working with cgroups/cpusets in our resource manager hwloc-info --pid is
_wonderful_
I think I am good to go.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
Google is giving me this url:
www.open-mpi.org/projects/hwloc//doc/v1.5/a2.php
When i searched for hwloc_bitmap_displaylist() (for which I can find nothing
nor a manpage :-) )
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Aug 10, 2012, at 4
http://www.open-mpi.org/projects/projects/hwloc/doc/
Oh noooss!!!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
I think so, sorry if I gave you the impression that Rmpi changed,
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote:
> Guess I'm confused - your original note indicated that something had chan
Ok will see, Rmpi we had working with 1.4 and has not been updated after 2010,
this this kinda stinks.
I will keep digging into it thanks for the help.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote
Ralph,
Rmpi wraps everything up, so I tried setting them with
export OMPI_plm_base_verbose=5
export OMPI_dpm_base_verbose=5
and I get no extra messages even on helloworld example simple MPI-1.0 code.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
] to [[48116,1],0]:16, can't find route
[0]
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
[0x2ae2ad17d0df]
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote:
> Could you repeat your tests with 1.4.5 and/or 1.5.5?
>
>
> On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote:
>
>> Hi,
Will do,
Right now I have asked the user to try rebuilding with the newest openmpi just
to be safe.
Interesting behavior rank0 the ib counters (using collctl) never gets a packet
in, only packets out.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
tcp with this code?
Can we disable the psm mtl and use the verbs emulation on qlogic? While the
qlogic verbs isn't that great it is still much faster in my tests than tcp.
Is there a particular reason to pick tcp?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734
hypre_ParCSRCommHandleDestroy() at ?:?
PMPI_Waitall() at ?:?
ompi_request_default_wait_all() at ?:?
opal_progress() at ?:?
Stack trace(s) for thread: 2
-
[0-63] (64 processes)
-----
start_thread() at ?:?
ips_ptl_pollint
a reable list
of every ranks posted sends? And then query an wiating MPI_Waitall() of a
running job to get what rends/recvs it is waiting on?
Thanks!
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
This should be fixed, there was a bad upload, the server had a different copy
than my machine. The fixed version is in place. Feel free to grab it again.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres
For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev
and Rob:
http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
Question,
If we are using torque with TM with cpusets enabled for pinning should we not
enable numactl? Would they conflict with each other?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
that enabling different openib_flags
of 313 fix the issue abit lower bw for some message sizes.
Has there been any progress on this issue?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 18, 2011, at 10:25 AM, Brock Palen wrote:
> Well I h
-recursive] Error 1
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
being able to reproduce it.
Any thoughts? Am I overlooking something?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 17, 2011, at 2:18 PM, Brock Palen wrote:
> Sorry typo 314 not 313,
>
> Brock Palen
> www.umich.edu/~bro
Sorry typo 314 not 313,
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 17, 2011, at 2:02 PM, Brock Palen wrote:
> Thanks, I though of looking at ompi_info after I sent that note sigh.
>
> SEND_INPLACE appears to help pe
and code.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On May 16, 2011, at 11:49 AM, George Bosilca wrote:
> Here is the output of the "ompi_info --param btl openib":
>
> MCA btl: parameter "btl_openib_fla
RASH progress pass their lockup points,
I will have a user test this,
Is this an ok option to put in our environment? What does 305 mean?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
>
> Thanks,
>
> Samuel Gutierrez
> Los Alamos Nationa
t find a relevant-looking one.
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>
> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
> connectx with more than one collective I can't recall.
Extra data point, that ticket said it ran with mpi_preconne
I am pretty sure MTL's and BTL's are very different, but just as a note,
This users code (Crash) hangs at MPI_Allreduce() in
Openib
But runs on:
tcp
psm (an mtl, different hardware)
Putting it out there if it does have any bearing. Otherwise ignore.
Brock Palen
www.umich.edu/~brockp
Center
On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:
> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>
>> We can reproduce it with IMB. We could provide access, but we'd have to
>> negotiate with the owners of the relevant nodes to give you interactive
>> access to them. Maybe Brock's would be
ll our ib0 interfaces to have IP's on a 172. network. This
allowed the use of rdmacm to work and get latencies that we would expect. That
said we are still getting hangs. I can very reliably reproduce it using IMB
with a specific core count on a specific test case.
Just an update. Has an
-1
librdmacm-devel-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-utils-1.0.11-1
So all the libraries are installed (I think) is there a way to verify this? Or
to have OpenMPI be more verbose what caused rdmacm to fail as an oob option?
Brock Palen
www.umich.edu/~brockp
Center for Advanced
ges
2 total processes killed (some possibly by mpirun during cleanup)
We were being bit by a number of codes hanging in collectives, and was resolved
by using rdmacm. We tried setting this as default till the two bugs in
bugzilla are resolved as a work around. Then we hit this problem on our
: 1
CPCs attempted: rdmacm
--
Again I think this is expected on this older hardware.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 22, 2011, at 10:23 AM
On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:
>
> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
>
>> Given that part of our cluster is TCP only, openib wouldn't even startup on
>> those hosts
>
> That is correct - it would have no impact on those hosts
>
Given that part of our cluster is TCP only, openib wouldn't even startup on
those hosts and this would be ignored on hosts with IB adaptors?
Just checking thanks!
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Apr 21, 2011, at 6:21 PM
1 - 100 of 256 matches
Mail list logo