Re: [OMPI devel] Drastic change in ORTE behavior between trunk and 1.5

2011-12-14 Thread Ralph Castain
On Dec 14, 2011, at 6:51 PM, George Bosilca wrote: > To be honest I'm totally lost in the naming scheme, which got me confused > about the RFC you're referring to. We had an MCA parameter to start a vm, so > I thought VM is some kind of special virtualized environment and not the > entire ORTE

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread Ralph Castain
On Dec 14, 2011, at 6:44 PM, George Bosilca wrote: > A comment in the commit suggest that the symbols were not linked into the > orterun if they were not accessed there. I guess this was the trick to make > sure MPIR_Breakpoint is in there. > > Now that you pointed me to this commit I have to

Re: [OMPI devel] Drastic change in ORTE behavior between trunk and 1.5

2011-12-14 Thread George Bosilca
To be honest I'm totally lost in the naming scheme, which got me confused about the RFC you're referring to. We had an MCA parameter to start a vm, so I thought VM is some kind of special virtualized environment and not the entire ORTE. Based on the behavior of the trunk and the RFC you referred

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread George Bosilca
A comment in the commit suggest that the symbols were not linked into the orterun if they were not accessed there. I guess this was the trick to make sure MPIR_Breakpoint is in there. Now that you pointed me to this commit I have to disagree with. Why the MPI debugging symbols have been delete

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread Ralph Castain
Yes - we were having problems making symbols in orterun visible for the "stat" debugger when built dynamically. The symbols are actually instantiated in the debugger base, but they need to be "seen" in orterun prior to us calling orte_init. So, we had to explicitly reference them. It was workin

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread Jeff Squyres
Looks like that line came over in https://svn.open-mpi.org/trac/ompi/changeset/24561, which was bringing over the debugger ORTE framework from the trunk (https://svn.open-mpi.org/trac/ompi/ticket/2688). Ralph -- do you remember why that line is there? On Dec 14, 2011, at 7:21 PM, Nathan Hjelm

Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-14 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 15/12/11 08:33, Ralph Castain wrote: > That param was intended to catch user-level mistakes > whereby the user specified a tmpdir location via the > tmpdir_base MCA param that the system admin wanted to > protect. It was not intended for someone to

[OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread Nathan Hjelm
There still seems to be an issue with using mpirun --debug with totalview. For some reason totalview is not breaking on MPIR_Breakpoint. Removing the foo = MPIR_Breakpoint line from orterun.c fixes this issue. Is there any reason I shouldn't remove that line? Any other debuggers that might bre

Re: [OMPI devel] Invalid free (btl_openib_endpoint.c, 448) in v1.5

2011-12-14 Thread Christopher Yeoh
On Tue, 13 Dec 2011 20:27:00 -0500 Jeff Squyres wrote: > On Dec 13, 2011, at 7:59 PM, Christopher Yeoh wrote: > > > Sorry, late to the discussion. This is a spurious warning caused by > > passing the NULL pointer to the opal free function which is > > actually ok. It was fixed by #2884 - this is

Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-14 Thread Ralph Castain
Well, I actually have to eat my words here. This code is alive and well. However, I don't think it does what you wanted or perhaps expected. That param was intended to catch user-level mistakes whereby the user specified a tmpdir location via the tmpdir_base MCA param that the system admin wante

Re: [OMPI devel] OMPI 1.4.5rc1 tested: gm

2011-12-14 Thread Jeff Squyres
Thanks for all the testing, Paul! On Dec 14, 2011, at 12:37 AM, Paul H. Hargrove wrote: > On one of the same "System 2" that I used to check compilation against > Quadrics Elan, I have multiple versions of the Myrinet GM headers/libs. > > System 2: Linux/x86 >> $ cat /etc/redhat-release >> Red

Re: [OMPI devel] Retrying a MPI_SEND

2011-12-14 Thread Hugo Daniel Meyer
Hello George and @ll. Sorry for the late answer, but i was doing some trace to see where is set the MPI_ERROR. I took a look to ompi_request_default_wait and try to see what happen with request. Well, i've noticed that all requests that are not inmediately solved go to ompi_request_wait_completio

Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-14 Thread Ralph Castain
This is amusing - reviewing the code quickly, it appears that the supporting code for orte_no_session_dir was mistakenly removed at some point. I'll restore that functionality. Thanks for pointing it out! On Dec 12, 2011, at 11:10 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE---

Re: [OMPI devel] Drastic change in ORTE behavior between trunk and 1.5

2011-12-14 Thread Ralph Castain
On Dec 13, 2011, at 9:10 PM, George Bosilca wrote: > I noticed today a drastic change in how ORTE deal with the hostfile between > trunk and 1.5. > > 1. 1.5 and prior used the hostile as a suggestion, a placeholder where to > pick the requested number of daemons during the launch. The current t

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread Jeff Squyres
I took the liberty of GK ratcheting that CMR through, in the interest of expediency... On Dec 14, 2011, at 8:15 AM, Shiqing Fan wrote: > I see the real problem now, the .windows file is not added into the tarball. > > On 2011-12-14 1:48 PM, George Bosilca wrote: >> Shiqing, >> >> This file see

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread Shiqing Fan
I see the real problem now, the .windows file is not added into the tarball. On 2011-12-14 1:48 PM, George Bosilca wrote: Shiqing, This file seems to be there. $ pwd /home/bosilca/unstable/1.5/ompi $ svn info opal/mca/shmem/windows/.windows Path: opal/mca/shmem/windows/.windows Name: .windows

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread Shiqing Fan
Hi George, Right, I was testing RC1 which has this problem. But now it shouldn't matter. Thanks, Shiqing On 2011-12-14 1:48 PM, George Bosilca wrote: Shiqing, This file seems to be there. $ pwd /home/bosilca/unstable/1.5/ompi $ svn info opal/mca/shmem/windows/.windows Path: opal/mca/shme

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread George Bosilca
Shiqing, This file seems to be there. $ pwd /home/bosilca/unstable/1.5/ompi $ svn info opal/mca/shmem/windows/.windows Path: opal/mca/shmem/windows/.windows Name: .windows URL: https://svn.open-mpi.org/svn/ompi/branches/v1.5/opal/mca/shmem/windows/.windows Repository Root: https://svn.open-mp

Re: [OMPI devel] 1.5.5rc1 tested: VT check failures on *BSD (with patch).

2011-12-14 Thread Matthias Jurenz
Thanks for the hint, Paul. This build issue is fixed by CMR #2938. Matthias On Wednesday 14 December 2011 07:44:48 Paul H. Hargrove wrote: > OK, Jeff probably wants to choke me for all these emails, but here comes > another... > > I am now configuring my 5 BSD systems with "--without-hwloc > --d

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread Shiqing Fan
Hi George, A .windows file seems still missing in opal/mca/shmem/windows/. Could you also svn add it (from the patch in shmem ticket)? It is not a source file, but rather a CMake required configuration file. Probably this change doesn't need another rc. :-) Thanks a lot. Regards, Shiqing

Re: [OMPI devel] 1.5.5rc1 is out

2011-12-14 Thread Brice Goglin
And a hwloc problem with very old sched_setaffinity on redhat 8, we're looking at it. Brice Le 14/12/2011 11:14, Paul H. Hargrove a écrit : > Summary of my 1.5.5rc1 testing findings: > > + generated config.h in tarball breaks hwloc on non-linux platforms: > http://www.open-mpi.org/community/list

Re: [OMPI devel] 1.5.5rc1 is out

2011-12-14 Thread Paul H. Hargrove
Summary of my 1.5.5rc1 testing findings: + generated config.h in tarball breaks hwloc on non-linux platforms: http://www.open-mpi.org/community/lists/devel/2011/12/10106.php + multiply defined symbols problem on MacOS 10.4 (PPC only): http://www.open-mpi.org/community/lists/devel/2011/12/10103.p

[OMPI devel] 1.5.5rc1: all my hwloc-related failures

2011-12-14 Thread Paul H. Hargrove
I have been working w/ Brice off-list and we have found the root cause of ALL those problems I've reported with linux-specific hwloc symbols on non-linux systems. Somehow the 1.5.1rc1 tarball contains a GENERATED file from a Linux system! $ find openmpi-1.5.5rc1 -name autogen | xargs ls openm

[OMPI devel] 1.5.5rc1 tested: Solaris 11 hwloc link failure

2011-12-14 Thread Paul H. Hargrove
Grumble. This is getting old. Add Solaris 11 on x86-64 to the list of platforms where OMPI is incorrectly trying to link Linux-specific hwloc symbols, even though I can build a stand-alone hwloc w/o problems: $ uname -a SunOS pcp-j-20 5.11 snv_151a i86pc i386 i86pc Solaris $ gcc --version | h

Re: [OMPI devel] 1.5.5rc1 tested: MacOS 10.4 x86-64 hwloc build failure

2011-12-14 Thread Paul H. Hargrove
On 12/13/2011 11:50 PM, Brice Goglin wrote: Le 14/12/2011 08:29, Paul H. Hargrove a écrit : I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the same hwloc issue I've encountered on {Free,Open,Net}BSD. The build fails with CCLD opal_wrapper /usr/bin/ld: Undefined symbols:

Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")

2011-12-14 Thread Paul H. Hargrove
I've attempted to reproduce the failure reported below for MacOS 10.4 for PPC on an X86-64 system. First, I've realized that while I reported "make check" as the source of the problem, it occurs at "make". Regardless of that mistake in my reporting, I was unable to reproduce the problem, makin

Re: [OMPI devel] 1.5.5rc1 tested: MacOS 10.4 x86-64 hwloc build failure

2011-12-14 Thread Brice Goglin
Le 14/12/2011 08:29, Paul H. Hargrove a écrit : > I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the > same hwloc issue I've encountered on {Free,Open,Net}BSD. > The build fails with >> CCLD opal_wrapper >> /usr/bin/ld: Undefined symbols: >> _opal_hwloc122_hwloc_backend_sysfs_e

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Brice Goglin
Le 14/12/2011 08:01, Paul H. Hargrove a écrit : > I cannot even *build* OpenMPI on {Free,Open,Net}BSD systems unless I > configure with --without-hwloc. > Thus I cannot agree w/ Brice's suggestion that I ignore this warning. Please try building hwloc (1.2.2 if you want the same one as OMPI current

[OMPI devel] 1.5.5rc1 tested: MacOS 10.4 x86-64 hwloc build failure

2011-12-14 Thread Paul H. Hargrove
I've attempted the build on MacOS 10.4 (Tiger) on x86-64, I hit the same hwloc issue I've encountered on {Free,Open,Net}BSD. The build fails with CCLD opal_wrapper /usr/bin/ld: Undefined symbols: _opal_hwloc122_hwloc_backend_sysfs_exit _opal_hwloc122_hwloc_backend_sysfs_init _opal_hwloc122_h

[OMPI devel] 1.5.5rc1 tested: elan

2011-12-14 Thread Paul H. Hargrove
I can "make all install clean" ompi-1.5.5rc1 on the following 2 systems which have Quadrics Elan headers/libs. System 1: Linux/x86-64 $ cat /etc/redhat-release CentOS release 4.2 (Final) $ uname -a Linux [hostname] 2.6.9-22.EL #1 Thu Feb 23 16:23:18 EST 2006 x86_64 x86_64 x86_64 GNU/Linux $ g

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Paul H. Hargrove
On 12/13/2011 10:53 PM, Brice Goglin wrote: Le 14/12/2011 07:17, Paul H. Hargrove a écrit : My OpenBSD and NetBSD testers have the same behavior, but now I see that I was at warned... On all the affected systems I found the following (modulo the system tuple) in the configure output: checkin

Re: [OMPI devel] 1.5.5rc1 tested: hwloc build failure on Red Hat Linux 8

2011-12-14 Thread Brice Goglin
Le 14/12/2011 07:12, Paul H. Hargrove a écrit : > I cannot hwloc in build 1.5.5rc1 on the following system: > > System 2: Linux/x86 >> $ cat /etc/redhat-release >> Red Hat Linux release 8.0 (Psyche) >> $ uname -a >> Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10 EDT 2009 >> i686 i6

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Brice Goglin
Le 14/12/2011 07:17, Paul H. Hargrove a écrit : > My OpenBSD and NetBSD testers have the same behavior, but now I see > that I was at warned... > > On all the affected systems I found the following (modulo the system > tuple) in the configure output: >> checking which OS support to include... Unsup

[OMPI devel] 1.5.5rc1 tested: VT check failures on *BSD (with patch).

2011-12-14 Thread Paul H. Hargrove
OK, Jeff probably wants to choke me for all these emails, but here comes another... I am now configuring my 5 BSD systems with "--without-hwloc --disable-io-romio". The systems (all using /usr/bin/gcc) are: FreeBSD-8.2-RELEASE on amd64: gcc (GCC) 4.2.1 20070719 [FreeBSD] FreeBSD-7.2-RELEA

Re: [OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Paul H. Hargrove
My OpenBSD and NetBSD testers have the same behavior, but now I see that I was at warned... On all the affected systems I found the following (modulo the system tuple) in the configure output: checking which OS support to include... Unsupported! (x86_64-unknown-openbsd5.0) configure: WARNING:

[OMPI devel] 1.5.5rc1 tested: hwloc build failure on Red Hat Linux 8

2011-12-14 Thread Paul H. Hargrove
I cannot hwloc in build 1.5.5rc1 on the following system: System 2: Linux/x86 $ cat /etc/redhat-release Red Hat Linux release 8.0 (Psyche) $ uname -a Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10 EDT 2009 i686 i686 i386 GNU/Linux $ gcc --version | head -1 gcc (GCC) 3.4.0

[OMPI devel] OMPI 1.4.5rc1 tested: gm

2011-12-14 Thread Paul H. Hargrove
On one of the same "System 2" that I used to check compilation against Quadrics Elan, I have multiple versions of the Myrinet GM headers/libs. System 2: Linux/x86 $ cat /etc/redhat-release Red Hat Linux release 8.0 (Psyche) $ uname -a Linux [hostname] 2.4.21-60.ELsmp #1 SMP Fri Aug 28 06:45:10

[OMPI devel] 1.5.5rc1 tested: REGRESSION on FreeBSD

2011-12-14 Thread Paul H. Hargrove
I am seeing build failures on the following: FreeBSD-8.2-RELEASE on amd64 FreeBSD-7.2-RELEASE on amd64 FreeBSD-6.3-RELEASE on amd64 All three fail with the same error: CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_hwloc122_hwloc_backend_sysfs

[OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")

2011-12-14 Thread Paul H. Hargrove
Using the 1.5.5rc1 tarball, I've repeated tests on the following platforms for which I recently reported 1.4.5rc1 results: MacOS 10.5 (Leopard) on PPC: powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5488) MacOS 10.4 (Tiger) on PPC: powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1