Hi,
at first thank you very much for your help.
1st patch:
> Can you apply the following patch to a trunk tarball and see if it works
> for you?
2nd patch:
> Found the problem. Was accessing a boolean variable using intval. That
> is a bug that has gone unnoticed on all platforms but thankfull
I notice Absoft's MTT runs are failing due to the change in
bind-to-core-by-default:
http://mtt.open-mpi.org/index.php?do_redir=2136
I asked Tony, who runs the Absoft MTT runs; he confirms that this particular
machine has 1 socket with 2 cores (and we're running -np 4 on this machine).
1. T
On 19 Dec 2013, at 13:59, Jeff Squyres (jsquyres) wrote:
>
> - if we oversubscribe, (possibly) warn about the performance loss of
> oversubscription, and don't bind
> - don't warn about lack of memory binding
>
> Thoughts?
+1, I hit this myself today. I typically run on a VM and oversubscrib
On 12/19/13 6:59 AM, "Jeff Squyres (jsquyres)" wrote:
>3. Finally, we're giving a warning saying:
>
>-
>WARNING: a request was made to bind a process. While the system
>supports binding the process itself, at least one node does NOT
>support binding memory to the process location.
>-
>
>F
Dear all,
please find attached a (trivial) patch to MPI_Dims_create(). When
computing the prime factors of nnodes, it is sufficient to check for
primes less or equal to sqrt(nnodes).
This was not so much of a problem in the past, but now that Tier 0
systems are capable of running O(10^6) MPI proc
Someone who understands the mpi debugging handles code:
The opal_progress_recursion_depth_counter and opal_progress_thread_counter
are both only used internally in opal_progress (for book keeping, but
never any decisions) and are declared in ompi_mpihandles_dll.c, but then
don't appear to be used.
On Dec 19, 2013, at 6:27 AM, Barrett, Brian W wrote:
> On 12/19/13 6:59 AM, "Jeff Squyres (jsquyres)" wrote:
>
>> 3. Finally, we're giving a warning saying:
>>
>> -
>> WARNING: a request was made to bind a process. While the system
>> supports binding the process itself, at least one node
On 12/19/13 8:43 AM, "Ralph Castain" wrote:
>
>On Dec 19, 2013, at 6:27 AM, Barrett, Brian W wrote:
>
>> On 12/19/13 6:59 AM, "Jeff Squyres (jsquyres)"
>>wrote:
>>
>>> 3. Finally, we're giving a warning saying:
>>>
>>> -
>>> WARNING: a request was made to bind a process. While the system
Siegmar --
So it looks like the net problem is fixed; good. I'll commit and CMR that.
For the DDT test, can you give us access to this machine? It might help speed
debugging a lot. (I'll let Nathan reply about the var problem)
If not, can you provide the following information about the DDT t
Okay, I think I have these things fixed in r29978 on the trunk - please give it
a spin and confirm so we can move it to 1.7.4
On Dec 19, 2013, at 7:54 AM, Barrett, Brian W wrote:
> On 12/19/13 8:43 AM, "Ralph Castain" wrote:
>
>>
>> On Dec 19, 2013, at 6:27 AM, Barrett, Brian W wrote:
>>
I think there's no problem with removing them from the dll code -- that stuff
doesn't affect MPI application ABI.
On Dec 19, 2013, at 9:42 AM, Barrett, Brian W wrote:
> Someone who understands the mpi debugging handles code:
>
> The opal_progress_recursion_depth_counter and opal_progress_thre
On Dec 19, 2013, at 10:54 AM, Barrett, Brian W wrote:
>> Just to help me understand a bit better - you are saying that the node
>> supports process binding, but not memory binding? I don't see how the
>> error appears otherwise, but want to ensure I understand the code path.
>
> That appears to
That worked for me.
Brian
On 12/19/13 9:32 AM, "Ralph Castain" wrote:
>
>
>
>Okay, I think I have these things fixed in r29978 on the trunk - please
>give it a spin and confirm so we can move it to 1.7.4
>
>
>
>On Dec 19, 2013, at 7:54 AM, Barrett, Brian W wrote:
>
>
>On 12/19/13 8:43 AM, "Ral
Nathan -
Any chance you can remove the two counters this afternoon?
Brian
On 12/19/13 10:01 AM, "Jeff Squyres (jsquyres)" wrote:
>I think there's no problem with removing them from the dll code -- that
>stuff doesn't affect MPI application ABI.
>
>
>On Dec 19, 2013, at 9:42 AM, Barrett, Brian
Andreas --
Thanks for the patch. Can I ask two things?
1. Can you separate the patch into two: one with the code change, and another
with the whitespace update? It will help the readability of the logs to see
the exact code change, rather than bury it in a syntax update.
2. You added a copyr
Yes. I will do that once I finish preparing the ORNL collectives for the trunk.
Will be 8pm at the latest.
-Nathan
From: devel [devel-boun...@open-mpi.org] on behalf of Barrett, Brian W
[bwba...@sandia.gov]
Sent: Thursday, December 19, 2013 10:24 AM
To: O
Thanks for the review. I am re-spinning the patches and sending the new
version in a few moments.
On Wed, Dec 18, 2013 at 06:56:47AM -0800, Ralph Castain wrote:
> In the case of the send, there really isn't any problem with just replacing
> things - the non-blocking change won't impact anything,
From: Adrian Reber
This is the second try to replace the usage of blocking send and
recv in the C/R code with the non-blocking versions. The new code
compiles (in contrast to the old code) but does not work yet.
This is the first step to get the C/R code working again. Right
now it only compiles.
From: Adrian Reber
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.
Changes from V1:
* #ifdef out the code (so it is preserved for later
From: Adrian Reber
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.
Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED
Changes fr
+1 from me
On Dec 19, 2013, at 12:54 PM, Adrian Reber wrote:
> From: Adrian Reber
>
> This patch changes all send/send_buffer occurrences in the C/R code
> to send_nb/send_buffer_nb.
> The new code compiles but does not work.
>
> Changes from V1:
> * #ifdef out the code (so it is preserved f
Looks okay to me. On the places where you need to block while waiting for an
answer, you can use OMPI_WAIT_FOR_COMPLETION - this will spin on opal_progress
until the condition is met. We use it elsewhere for similar purposes.
See ompi/mca/rte/rte.h for the definition
On Dec 19, 2013, at 12:54
Hi folks
Given the amount of changes/fixes pushed into the 1.7.4rc's this week, it seems
best that we delay that release until after the holiday. Accordingly, the
revised release plan looks like this:
1.7.4rc2 - this weekend
1.7.4 - Jan 10th
1.7.5 feature freeze (hard deadline) - Jan 24th
1.
I see the failure below when building 1.7.4rc1 on FreeBSD-9 (amd64).
It looks to be just a missing header, probably sys/stat.h.
$ gcc --version
gcc (GCC) 4.2.1 20070831 patched [FreeBSD]
Only configure option passed was --prefix-...
-Paul
Making all in mca/sharedfp/sm
CC sharedfp_sm.l
When building 1.7.4rc1 on OpenBSD-5 and NetBSD-6 (both amd64) I see what
appears to be the same three errors ("make" output at end of this email)
on both platforms.
All three syntax errors appears to be collisions on the symbol if_mtu:
-bash-4.2$ cat -n openmpi-1.7.4rc1/opal/util/if.h | grep -w
In 1.7.4rc1's README support is still claimed for Solaris 11 on x86_64 with
Sun Studio (12.2 and 12.3):
- Oracle Solaris 10 and 11, 32 and 64 bit (SPARC, i386, x86_64),
with Oracle Solaris Studio 12.2 and 12.3
However, I get a build failure when configured with:
CC=cc CFLAGS=-m64 --w
I've confirmed that the ifr_hwaddr problem also occurs with this system's
/usr/bin/gcc:
Making all in mca/if/posix_ipv4
make[2]: Entering directory
`/shared/OMPI/openmpi-1.7.4rc1-solaris11-x64-ib-gcc452/BLD/opal/mca/if/posix_ipv4'
CC if_posix.lo
/shared/OMPI/openmpi-1.7.4rc1-solaris11-x64-
Paul --
Does this patch fix it for you?
Index: opal/mca/if/posix_ipv4/configure.m4
===
--- opal/mca/if/posix_ipv4/configure.m4 (revision 29997)
+++ opal/mca/if/posix_ipv4/configure.m4 (working copy)
@@ -42,8 +42,10 @@
)
Jeff,
The patch looks fine to my eyes, but I cannot test it:
1) Not sure if email botched withepsace or what, but the patch didn't apply
to if_posix.c.
2) Even if it did, I don't have sufficiently new autoconf on that system to
"use" the configure.m4 part of the patch.
Any chance of a patched-an
Try http://www.open-mpi.org/~jsquyres/unofficial/.
Should have both "if" fixes in it.
On Dec 19, 2013, at 7:12 PM, Paul Hargrove wrote:
> Jeff,
>
> The patch looks fine to my eyes, but I cannot test it:
>
> 1) Not sure if email botched withepsace or what, but the patch didn't apply
> to if_
On Dec 19, 2013, at 6:27 PM, Paul Hargrove wrote:
> When building 1.7.4rc1 on OpenBSD-5 and NetBSD-6 (both amd64) I see what
> appears to be the same three errors ("make" output at end of this email) on
> both platforms.
>
> All three syntax errors appears to be collisions on the symbol if_mt
Fixed and cmr'd
thanks!
On Dec 19, 2013, at 3:10 PM, Paul Hargrove wrote:
> I see the failure below when building 1.7.4rc1 on FreeBSD-9 (amd64).
> It looks to be just a missing header, probably sys/stat.h.
>
> $ gcc --version
> gcc (GCC) 4.2.1 20070831 patched [FreeBSD]
>
> Only configure opt
Jeff,
The unofficial "rc2forpaul" gets past the (disgusting) if_mtu problem on
both platforms.
On NetBSD-6 the build completes ("make install" fails, but I'll report that
separately).
However, on OpenBSD-5 we now encounter another failure about 20 files later:
CC sys_limits.lo
/home/pha
Jeff,
Solaris 11 / x86_64 build get farther than before, but fails with the
following:
make[2]: Entering directory
`/shared/OMPI/openmpi-1.7.4rc2forpaul-solaris11-x64-ib-gcc452/BLD/ompi/mca/btl/usnic'
CC btl_usnic_module.lo
In file included from
/shared/OMPI/openmpi-1.7.4rc2forpaul-solari
Testing with Solaris 10 on SPARC, I was expecting to encounter the bus
error reported previously by Siegman Gross. Instead I see the following
hwloc-related abort:
$ env
PATH=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin:$PATH
LD_LIBRARY_PATH_64=/home/hargrove/OMPI/o
Jeff,
I didn't actually get very far after fixing __always_inline.
In fact, the build still fails on the *same* line, but for a different
(valid) reason:
fls() is declared in /usr/include/string.h
Making all in mca/btl/usnic
make[2]: Entering directory
`/shared/OMPI/openmpi-1.7.4rc2forpaul-so
Attached is the output from "make install" of 1.7.4rc1 + Jeff's fix for the
symbol conflict on "if_mtu".
There appear to be at least 2 issues.
1) There are lots of (not fatal) messages about ldconfig not existing, but
according to he NetBSD lists that utility went away with the conversion
from a.
I added protections for all the RLIMIT values, just in case. Thanks!
Ralph
On Dec 19, 2013, at 6:25 PM, Paul Hargrove wrote:
> Jeff,
>
> The unofficial "rc2forpaul" gets past the (disgusting) if_mtu problem on both
> platforms.
>
> On NetBSD-6 the build completes ("make install" fails, but I'
I believe this one has already been fixed and is in the nightly (1.7.4rc2) -
for now, you can just set "--bind-to none" on the cmd line to get past it
On Dec 19, 2013, at 6:42 PM, Paul Hargrove wrote:
> Testing with Solaris 10 on SPARC, I was expecting to encounter the bus error
> reported pr
Ralph,
I can confirm "--bind-to none" worked to eliminate the error, but the test
now appears to hang :-(
Since you say the binding probably fixed for rc2, I'll see if the latest
nightly tarball works better by default.
-Paul
On Thu, Dec 19, 2013 at 7:19 PM, Ralph Castain wrote:
> I believe
Probably nobody cares, but I'll report this for completeness.
In trying to understand the "make install" failure on NetBSD-6 I run
"autogen.sh".
The versions detected:
Searching for autoconf
Found autoconf version 2.69; checking version...
Found version component 2 -- need 2
FYI:
My Solaris-11/x86-64/gcc-4.5.2 build completes with the following three
changes:
+ Jeff's fix for if_posix.c
+ changing __always_inline to __opal_attribute_always_inline__
+ fixing the fls() conflict by renaming OMPI's to "my_fls()" (just a lazy
choice).
-Paul
On Thu, Dec 19, 2013 at 6:47
42 matches
Mail list logo