[OMPI users] Some nodes have ucx over IB failures

2021-01-12 Thread Brock Palen via users
mca_pml_ucx_close [lh0057.arc-ts.umich.edu:04804] pml_ucx.c:178 mca_pml_ucx_close Brock Palen IG: brockpalen1984 www.umich.edu/~brockp Director Advanced Research Computing - TS bro...@umich.edu Office: (734)936-1985 (not in use during Covid) Cell: (989)277-6075

Re: [hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen
Thanks, yeah I was looking for an API that would take into consideration most cases, like I find with hwloc-bind --get where I can find the number the process has access to. Wether is cgroups, other sorts of affinity setting etc. Brock Palen IG: brockpalen1984 www.umich.edu/~brockp Director

Re: [hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen
do allow the user to specify number of threads, but would like to automate it for least astonishment. Brock Palen IG: brockpalen1984 www.umich.edu/~brockp Director Advanced Research Computing - TS bro...@umich.edu (734)936-1985 On Mon, Aug 31, 2020 at 11:34 AM Guy Streeter wrote: > My v

[hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen
Hello, I have a small utility, it is currently using multiprocess.cpu_count() Which currently ignores cgroups etc. I see https://gitlab.com/guystreeter/python-hwloc But appears stale, How would you detect number of threads that are safe to start in a cgroup from Python3 ? Thanks! Brock

[OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread Brock Palen
there being a way to give OMPI a stack of work todo from the talk at SC this year, but I can't figure it out if it does what I think it should do. Thanks, Brock Palen www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp> Director Advanced Research Computing - TS XSEDE Campus Champi

Re: [OMPI users] Building against XLC XLF

2016-04-03 Thread Brock Palen
. does it work with gcc compilers ?) > > Cheers, > > Gilles > > > On 4/4/2016 11:28 AM, Brock Palen wrote: > > I recently needed to build an OpenMPI build on Power 8 where I had access > to xlc / xlf > > The current release fails (as noted in the readme) > But a clone

[OMPI users] Building against XLC XLF

2016-04-03 Thread Brock Palen
! -- Brock Palen www.umich.edu/~brockp Assoc. Director Advanced Research Computing - TS XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] configuring a code with MPI/OPENMPI

2015-02-03 Thread Brock Palen
I'll hit you off list with my Abinit OpenMPI build notes, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Feb 3, 2015, at 2:26 PM, Nick Papior Andersen <nickpap...@gmail.com> wrote: > > I also concur with Je

Re: [OMPI users] best function to send data

2014-12-22 Thread Brock Palen
://citutor.org/login.php Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Dec 19, 2014, at 5:56 PM, Diego Avesani <diego.aves...@gmail.com> wrote: > > dear all users, > I am new in MPI world. > I wo

Re: [hwloc-users] Selecting real cores vs HT cores

2014-12-11 Thread Brock Palen
(32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#2) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 PU L#2 (P#1) PU L#3 (P#3) HostBridge L#0 PCI 8086:7010 PCI 1013:00b8 PCI 8086:10ed Net L#0 "eth0" Brock Palen www.umich.edu/~brockp CAE

Re: [OMPI users] How to find MPI ranks located in remote nodes?

2014-11-25 Thread Brock Palen
Are you doing this just for debugging? Or you really want to do it within the MPI program? orte-ps Gives you the pid/host for each rank, but I don't think there is any standard way to do this via API. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen
the 1gig interfaces but yet data is being sent out the 10gig eoib0/ib0 interfaces. I'll go do some measurements and see. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) <

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen
Right I understand those are TCP interfaces, I was just showing that I have two TCP interfaces over one physical interface, so why I was asking how TCP interfaces were selected. It rarely if ever will mater to us. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion

[OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Brock Palen
PI figure out that it can also talk over the others? How does it chose to load balance? BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and eoib0 are the same physical device and may screw with load balancing if anyone ver falls back to TCP. Brock Palen www.umich.ed

Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Brock Palen
Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 31, 2014, at 2:22 PM, Ralph Castain <rhc.open...@gmail.com> wrote: > > >> On Oct 30, 2014, at 3:15 PM, Brock Palen <bro...@umich.e

[OMPI users] IB Retry Limit Errors when fabric changes

2014-10-31 Thread Brock Palen
B fabric, this can happen. Multiple times now when just plugging in line cards to switches on a live system causes large swaths of jobs to die with this error. Does anyone else have this problem? We are a Mellonox based fabric. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE

[OMPI users] orte-ps and orte-top behavior

2014-10-30 Thread Brock Palen
source don't they? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
Thanks this is good feedback. I was worried with the dynamic nature of Yarn containers that it would be hard to coordinate wire up, and you have confirmed that. Thanks Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct

[OMPI users] Java FAQ Page out of date

2014-10-27 Thread Brock Palen
I think a lot of the information on this page: http://www.open-mpi.org/faq/?category=java Is out of date with the 1.8 release. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

[OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html Which appears to imply extra setup required. Is this documented anywhere for OpenMPI? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Brock Palen
Doing special files on NFS can be weird, try the other /tmp/ locations: /var/tmp/ /dev/shm (ram disk careful!) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gma

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-24 Thread Brock Palen
the only 2 processes come from. I checked some of the other jobs and the cpusets and the pbs server cpu list are the same. More investigation required. Still strange why would it give that message at all? Why would OpenMPI care, and why only when -np ## is given. Brock Palen www.umich.ed

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
Yes the request to torque was procs=64, We are using cpusets. the mpirun without -np 64 creates 64 spawned hostnames. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.

[OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
I do wrong? I'm stumped why one works one doesn't but the one that doesn't if your force it appears correct. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen
Original Message- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen >> Sent: Friday, September 05, 2014 5:22 PM >> To: Open MPI Users >> Subject: [OMPI users] enable-cuda with disable-dlopen >> >> * PGP Signed by an unknown key >&

[OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen
=$CUDA \ --with-hwloc=internal \ --with-verbs \ --with-psm \ --with-tm=/usr/local/torque \ --with-fca=$FCA \ --with-mxm=$MXM \ --with-knem=$KNEM \ --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \ $COMPILERS Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-28 Thread Brock Palen
Interesting, we are using 3.0 that is in MOFED, and that is also what is on the MXM download site. Kinda confusing. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 28, 2014, at 2:12 AM, Mike Dubman <

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
Brice, et al. Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with knem and mxm 3.0, If we have questions we will let you know. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 27, 2014, at 12:44 PM

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
trying to understand why that is the default. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 27, 2014, at 10:15 AM, Alina Sklarevich <ali...@dev.mellanox.co.il> wrote: > Hi, > > KNEM can improve

[OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
it a little. Should we investigate adding it to our systems? Is there a way to suppress this warning? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail

[OMPI users] HugeTLB messages from mpi code

2014-07-01 Thread Brock Palen
performance, but i'm not sure what todo about it? There is nothing on the list, but there was one reference to another MPI library. Is there any idea what would cause this? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

[OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Brock Palen
provide). One thought is to have the data collector processes be threads inside the MPI job running across all nodes, but was curious is there is a way to pass data still in memory (to much to hit disk) to the running MPI filter job. Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-25 Thread Brock Palen
;). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. [brockp@flux-login1 34241]$ ompi_info --param all all --level 9 (gives me what I expect). Thanks, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campu

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-23 Thread Brock Palen
"eth0,192.168.0.0/16"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. This is normally much longer. And yes we don't have the PHI stuff installed on all nodes, strange that 'all all' is now very short, omp

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
Perfection! That appears to do it for our standard case. Now I know how to set MCA options by env var or config file. How can I make this the default, that then a user can override? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
as on each host? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:38 PM, Brock Palen <bro...@umich.edu> wrote: > I was able to produce it in my test. > > orted affinity set by cpuset: > [root@

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
:68072] MCW rank 28 is not bound (or bound to all available processors) [nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all available processors) [nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all available processors) Brock Palen www.umich.edu

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
Got it, I have the input from the user and am testing it out. It probably has less todo with torque and more cpuset's, I'm working on producing it myself also. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen
In this case they are a single socket, but as you can see they could be ether/or depending on the job. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote: &

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Brock Palen
odes. That is good to know, I think we would want to turn our default to 'bind to core' except for our few users who use hybrid mode. Our CPU set tells you what cores the job is assigned. So in the problem case provided, the cpuset/cgroup shows only cores 8-11 are available to this job on this

[OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-18 Thread Brock Palen
0x0f00 8,9,10,11 Which is exactly what I would expect. So ummm, i'm lost why this might happen? What else should I check? Like I said not all jobs show this behavior. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc

Re: [OMPI users] Enable PMI build

2014-05-29 Thread Brock Palen
Ok I have dug into this more. Is this PMI the Slurm process manager? To use OpenMPI on Phi just build OPenMPI for it? Does that mean I need to add CFLAGS FCFLAGS -mmic ? How does one go about doing multi-phi MPI code? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Brock Palen
mpiifort and mpiicc are intel MPI library commands, in openmpi and others the analogous would be mpifort and mpicc Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On May 27, 2014, at 2:11 PM, Lorenzo Donà <lorechimic...@hotmail

Re: [OMPI users] pinning processes by default

2014-05-23 Thread Brock Palen
/ Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On May 23, 2014, at 9:19 AM, Albert Solernou <albert.soler...@oerc.ox.ac.uk> wrote: > Hi, > after compiling and installing OpenMPI 1.8.1, I find that OpenMPI is pinning >

[OMPI users] Enable PMI build

2014-05-16 Thread Brock Palen
to the MPSS stack and this Phi stuff is very infantile at the moment so minimal (decent) documentation, does anyone know what current package provides PMI for the Xeon Phi? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

[OMPI users] OpenIB Cannot Allocate Memory error

2014-02-27 Thread Brock Palen
e descriptors to shared receive queue 2 (0 from 105) [nyx5641.engin.umich.edu:30080] 4868 more processes have sent help message help-mpi-btl-openib.txt / mem-reg-fail [nyx5641.engin.umich.edu:30080] 557 more processes have sent help message help-mpi-btl-openib.txt / mem-reg-fail Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail

Re: [hwloc-users] Using hwloc to map GPU layout on system

2014-02-14 Thread Brock Palen
On Feb 7, 2014, at 9:45 AM, Brice Goglin <brice.gog...@inria.fr> wrote: > Le 06/02/2014 21:31, Brock Palen a écrit : >> Actually that did turn out to help. The nvml# devices appear to be numbered >> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices

Re: [OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-02-14 Thread Brock Palen
of the packages: http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-4_0_42.txt Thanks Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jan 31, 2014, at 9:06 AM, Mike Dubman <

Re: [hwloc-users] Using hwloc to map GPU layout on system

2014-02-06 Thread Brock Palen
e cuda and nvml devices in order. I dont' know if some value are deterministic though. Could I ignore the CoProc line and just use the: GPU L#3 "nvml2" GPU L#5 "nvml3" GPU L#7 "nvml0" GPU L#9 "nvml1&

[OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-01-30 Thread Brock Palen
@IBMAD_1.3' libibmad is installed, but the symbol smp_mkey_set is not defined in it. IBMAD_1.3 is though. Any thought what may cause this? As far as I know our MOFED is from Mellnox and should match up fine to their release of FCA. So this has me scratching my head. Thanks Brock Palen

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen
BAH, The error persisted when doing the test to /tmp/ (local disk) I rebuilt the library with the same compiler and all is well now. Sorry for the false alarm. Thanks for the help and ideas Jeff. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen
I never saw any replies on this. Has anyone else been able to produce this sort of error? It is 100% reproducible for me. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jan 9, 2014, at 11:46 AM, Brock Palen <bro...@umich.

[OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-09 Thread Brock Palen
--with-psm \ --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \ --with-mxm=$MXM \ --with-fca=$FCA \ --disable-dlopen --enable-shared \ $COMPILERS Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985

Re: [OMPI users] FCA collectives disabled by default

2013-04-03 Thread Brock Palen
That would do it. Thanks! Now to make even the normal ones work Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Apr 3, 2013, at 10:31 AM, Ralph Castain <r...@open-mpi.org> wrote: > Looking at the source code, it is because th

[OMPI users] FCA collectives disabled by default

2013-04-02 Thread Brock Palen
" (current value: <0>, data source: default value) MCA coll: parameter "coll_fca_enable_alltoallv" (current value: <0>, data source: default value) MCA coll: parameter "coll_fca_enable_alltoallw" (current value: <0>, data source: default value) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Brock Palen
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote: > or do i just need to compile two versions, one with IB and one without? You should not need to, we have OMPI compiled for openib/psm and run that same install on psm/tcp and verbs(openib) based gear. All the nodes assigned to your job

[OMPI users] IBV_EVENT_QP_ACCESS_ERR

2013-01-23 Thread Brock Palen
problems within the fabric; please contact your system administrator. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
>> You sound like our vendors, "what is your app" > > ;-) I used to be one. > > Ideally OMPI should do the switch between MXM/RC/XRC internally in the > transport layer. Unfortunately, > we don't have such smart selection logic. Hopefully IB vendors will fix some > day. I actually looked

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
at the FAQ page it states that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank counts of any size. I think we will do some testing, we never even heard of MXM before, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jan 22, 2

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen
w00t :-) Thanks Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote: > HmmmI'll see what I can do about the error message. I don't think there > is much in 1.6 I can do, but in 1.7 I could ge

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen
was looking for a node that had a bad socket or wrong part. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote: > I'm afraid these are both known problems in the 1.6.2 release. I believe we > fixed nper

[OMPI users] 1.6.2 affinity failures

2012-12-19 Thread Brock Palen
processors, where N > M). Double check that you have enough unique processors for all the MPI processes that you are launching on this host. You job will now abort. -- Brock Palen www.umich.edu/~brockp CAEN Advanced Comput

Re: [OMPI users] Romio and OpenMPI builds

2012-12-07 Thread Brock Palen
Thanks! So it looks like most OpenMPI builds out there are running with ROMIO's that are obvious to any optimizations to what they are running on. I have added this to our build notes so we get it in next time. Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro

Re: [OMPI users] Romio and OpenMPI builds

2012-12-06 Thread Brock Palen
ave Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So that list should be good for our site. Would this be a good recommendation for us to include in all our MPI builds? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Dec 3, 2012, at 7:1

[OMPI users] Romio and OpenMPI builds

2012-12-03 Thread Brock Palen
was built with when I built it? Can I make ROMIO go into 'verbose' mode and have it print what it is setting all its values to? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] Java MPI Bindings in 1.6.x

2012-11-28 Thread Brock Palen
? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brock Palen
of the code does use close to 12 cores as expected. If I cervumvent out batch system and the cgroups a normal mpirun ./stream does start 12 processes that consume a full 100% core. Thoughts? This is really odd linux scheduler behavior. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-26 Thread Brock Palen
Thanks and super cool. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote: > On Aug 24, 2012, at 10:45 AM, Brock Palen wrote: > >>> Right now we should be just warning if we can't regis

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen
On Aug 24, 2012, at 10:38 AM, Jeff Squyres wrote: > On Aug 24, 2012, at 10:28 AM, Brock Palen wrote: > >> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with >> 1.6.0 with low registered memory. From reading the release notes rather >>

[OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen
is for MPI to blow up saying "can't allocate registered memory, fatal, contact your admin", rather than fall back to send/receive and just be slower. Am I reading the release notes correctly? Is there a tunable setting to blow up rather than fallback? Brock Palen www.umich.edu/~brockp CAE

Re: [hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen
Yep very odd, Looks like torque wrote a wrapper then for some hwloc functions. BTW working with cgroups/cpusets in our resource manager hwloc-info --pid is _wonderful_ I think I am good to go. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen
Google is giving me this url: www.open-mpi.org/projects/hwloc//doc/v1.5/a2.php When i searched for hwloc_bitmap_displaylist() (for which I can find nothing nor a manpage :-) ) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Aug 10, 2012, at 4

[hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen
http://www.open-mpi.org/projects/projects/hwloc/doc/ Oh noooss!!! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
I think so, sorry if I gave you the impression that Rmpi changed, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote: > Guess I'm confused - your original note indicated that something had chan

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
Ok will see, Rmpi we had working with 1.4 and has not been updated after 2010, this this kinda stinks. I will keep digging into it thanks for the help. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
Ralph, Rmpi wraps everything up, so I tried setting them with export OMPI_plm_base_verbose=5 export OMPI_dpm_base_verbose=5 and I get no extra messages even on helloworld example simple MPI-1.0 code. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen
] to [[48116,1],0]:16, can't find route [0] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) [0x2ae2ad17d0df] Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Brock Palen
. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote: > Could you repeat your tests with 1.4.5 and/or 1.5.5? > > > On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote: > >> Hi,

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
Will do, Right now I have asked the user to try rebuilding with the newest openmpi just to be safe. Interesting behavior rank0 the ib counters (using collctl) never gets a packet in, only packets out. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
tcp with this code? Can we disable the psm mtl and use the verbs emulation on qlogic? While the qlogic verbs isn't that great it is still much faster in my tests than tcp. Is there a particular reason to pick tcp? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
hypre_ParCSRCommHandleDestroy() at ?:? PMPI_Waitall() at ?:? ompi_request_default_wait_all() at ?:? opal_progress() at ?:? Stack trace(s) for thread: 2 - [0-63] (64 processes) ----- start_thread() at ?:? ips_ptl_pollint

[OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen
a reable list of every ranks posted sends? And then query an wiating MPI_Waitall() of a running job to get what rends/recvs it is waiting on? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen
This should be fixed, there was a bad upload, the server had a different copy than my machine. The fixed version is in place. Feel free to grab it again. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres

[OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen
For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev and Rob: http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985

[OMPI users] numactl with torque cpusets

2011-11-09 Thread Brock Palen
Question, If we are using torque with TM with cpusets enabled for pinning should we not enable numactl? Would they conflict with each other? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-07-27 Thread Brock Palen
that enabling different openib_flags of 313 fix the issue abit lower bw for some message sizes. Has there been any progress on this issue? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 18, 2011, at 10:25 AM, Brock Palen wrote: > Well I h

[OMPI users] openmpi-1.4.3 and pgi-11.6 segfault

2011-06-21 Thread Brock Palen
-recursive] Error 1 Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-18 Thread Brock Palen
being able to reproduce it. Any thoughts? Am I overlooking something? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 17, 2011, at 2:18 PM, Brock Palen wrote: > Sorry typo 314 not 313, > > Brock Palen > www.umich.edu/~bro

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
Sorry typo 314 not 313, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 17, 2011, at 2:02 PM, Brock Palen wrote: > Thanks, I though of looking at ompi_info after I sent that note sigh. > > SEND_INPLACE appears to help pe

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
and code. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 16, 2011, at 11:49 AM, George Bosilca wrote: > Here is the output of the "ompi_info --param btl openib": > > MCA btl: parameter "btl_openib_fla

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-16 Thread Brock Palen
RASH progress pass their lockup points, I will have a user test this, Is this an ok option to put in our environment? What does 305 mean? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 > > Thanks, > > Samuel Gutierrez > Los Alamos Nationa

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-13 Thread Brock Palen
t find a relevant-looking one. >> >> https://svn.open-mpi.org/trac/ompi/ticket/2714 > > Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on > connectx with more than one collective I can't recall. Extra data point, that ticket said it ran with mpi_preconne

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen
I am pretty sure MTL's and BTL's are very different, but just as a note, This users code (Crash) hangs at MPI_Allreduce() in Openib But runs on: tcp psm (an mtl, different hardware) Putting it out there if it does have any bearing. Otherwise ignore. Brock Palen www.umich.edu/~brockp Center

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen
On May 12, 2011, at 10:13 AM, Jeff Squyres wrote: > On May 11, 2011, at 3:21 PM, Dave Love wrote: > >> We can reproduce it with IMB. We could provide access, but we'd have to >> negotiate with the owners of the relevant nodes to give you interactive >> access to them. Maybe Brock's would be

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Brock Palen
ll our ib0 interfaces to have IP's on a 172. network. This allowed the use of rdmacm to work and get latencies that we would expect. That said we are still getting hangs. I can very reliably reproduce it using IMB with a specific core count on a specific test case. Just an update. Has an

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-05 Thread Brock Palen
-1 librdmacm-devel-1.0.11-1 librdmacm-devel-1.0.11-1 librdmacm-utils-1.0.11-1 So all the libraries are installed (I think) is there a way to verify this? Or to have OpenMPI be more verbose what caused rdmacm to fail as an oob option? Brock Palen www.umich.edu/~brockp Center for Advanced

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-28 Thread Brock Palen
ges 2 total processes killed (some possibly by mpirun during cleanup) We were being bit by a number of codes hanging in collectives, and was resolved by using rdmacm. We tried setting this as default till the two bugs in bugzilla are resolved as a work around. Then we hit this problem on our

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-27 Thread Brock Palen
: 1 CPCs attempted: rdmacm -- Again I think this is expected on this older hardware. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 22, 2011, at 10:23 AM

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-22 Thread Brock Palen
On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote: > > On Apr 21, 2011, at 4:41 PM, Brock Palen wrote: > >> Given that part of our cluster is TCP only, openib wouldn't even startup on >> those hosts > > That is correct - it would have no impact on those hosts >

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Brock Palen
Given that part of our cluster is TCP only, openib wouldn't even startup on those hosts and this would be ignored on hosts with IB adaptors? Just checking thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 21, 2011, at 6:21 PM

  1   2   3   >