Paul,
about the second point :
mmap is called with the MAP_FIXED flag, before the fix, the
required address was not aligned on a page size and hence
mmap failed.
the mmap failure was immediatly handled, but for some reasons
i did not fully investigate yet, this failure was not correctly
I am seeing the following on OpenBSD/amd64 with "make V=1":
Making all in tools/wrappers
/bin/sh ../../../libtool --tag=CC--mode=link gcc -std=gnu99 -g
-finline-functions -fno-strict-aliasing -pthread -export-dynamic -o
opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lutil -lm
I have confirmed that George's latest version works on both SPARC ABIs.
ARMv7 and three MIPS ABIs still pending...
-Paul
On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote:
> Another version of the atomic patch. Paul has tested it on a bunch of
> platforms. At this
Regarding review of the coll/ml fix:
While the fix Gilles worked out overnight proved sufficient on
Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns:
1) As I already voiced on the list, I am concerned with the portability of
_SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()).
2)
MIPS32, MIPS64 and ARMv7 tests are also pending.
-Paul
On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote:
> Another version of the atomic patch. Paul has tested it on a bunch of
> platforms. At this point we have confirmation from all architectures except
> SPARC (v8+
Another version of the atomic patch. Paul has tested it on a bunch of
platforms. At this point we have confirmation from all architectures except
SPARC (v8+ and v9).
George.
atomics.patch
Description: Binary data
On Jul 31, 2014, at 19:13 , George Bosilca wrote:
>
I guess (r32401).
George.
On Fri, Aug 1, 2014 at 12:32 PM, Ralph Castain wrote:
> I found the problem - the issue is that assert on the convertor. MPI apps
> are setting that convertor, but not non-MPI apps, and so the field is NULL.
> Can we remove that assert?
>
>
> On
Thanks George!
On Aug 1, 2014, at 9:36 AM, svn-commit-mai...@open-mpi.org wrote:
> Author: bosilca (George Bosilca)
> Date: 2014-08-01 12:36:23 EDT (Fri, 01 Aug 2014)
> New Revision: 32401
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32401
>
> Log:
> No more assert in the proc
I found the problem - the issue is that assert on the convertor. MPI apps are
setting that convertor, but not non-MPI apps, and so the field is NULL. Can we
remove that assert?
On Aug 1, 2014, at 9:30 AM, George Bosilca wrote:
> I missed the fact that the app doesn't
I missed the fact that the app doesn't force it. But if this is indeed the
case then it is extremely weird that you are seing someone else releasing
your proc.
Regarding the destruction of the proc, the OPAL layer only does in a single
place, when the local proc is set (opal_proc_local_set).
On Aug 1, 2014, at 8:27 AM, George Bosilca wrote:
> This commit brings two things. One if the renaming suggested by Gilles. The
> second one is forcing the ORTE process down on the OPAL. This doesn't fit the
> current design of the BTL move. The current design assumes
Sorry, finally got through all this ompi email and see this problem was fixed.
-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard Jr.,
Howard
Sent: Friday, August 01, 2014 8:59 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] openmpi-1.8.2rc2
Hi Jeff,
Finally got info yesterday about where the newer PGI compilers are hiding out
at LANL.
I'll check this out today.
Howard
-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Tuesday, July 29, 2014 5:24 PM
To: Open MPI
Oh, should point out: I didn't deal with the potential btl/tcp issue you noted
- I defer that to George
On Aug 1, 2014, at 7:56 AM, Ralph Castain wrote:
> Hi Gilles
>
> I'm not sure if we have a problem or not - we'll have to wait and see, I
> guess. So far, I'm not
Hi Gilles
I'm not sure if we have a problem or not - we'll have to wait and see, I guess.
So far, I'm not seeing any problems on x86 archs, but that's to be expected and
I don't have access to anything else.
I fixed the issues you noted plus a few others I found. I imagine we'll
discover more
Okay, I fixed those two and will release rc4 once the coll/ml fix has been
reviewed. Thanks
On Aug 1, 2014, at 2:46 AM, Mike Dubman wrote:
> Also, latest commit into openib (origin/v1.8
> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
>
>
Thanks Paul - added and cmr'd
On Jul 31, 2014, at 11:23 PM, Paul Hargrove wrote:
>
> /home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36:
> error: use of undeclared identifier 'S_IRUSR'
> fd =
one last point :
in orte_process_name_t, jobid and vpid have type orte_jobid_t and
orte_vpid_t which really is uint32_t.
in orte/util/proc.c, the function pointers opal_process_name_vpid and
opal_process_name_jobid
return an int32_t
should it be an uint32_t instead ?
/* and then
George,
I compiled trunk with your patch for SPARCV9/Linux/GCC.
I see following warning/errors.
In file included from opal/include/opal/sys/atomic.h:175,
from opal/asm/asm.c:21:
Also, latest commit into openib (origin/v1.8
https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
*11:45:01* + timeout -s SIGSEGV 3m
/scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun
-np 8 -mca pml ob1 -mca btl self,openib
Paul,
i just commited r32393 (and made a CMR for v1.8)
can you please give it a try ?
in the mean time, i received your email ...
sysconf is called directly (e.g. no #ifdef protected) in several other
places :
$ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v
autom4te |grep
Note that the Solaris unresolved alloca problem George fixed in r32388 is
still present in 1.8.2rc3.
I have manually confirmed that the same patch resolves the problem in
1.8.2rc3.
-Paul
On Thu, Jul 31, 2014 at 9:44 PM, Ralph Castain wrote:
> Usual place - this is a
Hmm, maybe this has nothing to do with big-endian.
Below is a backtrace from ring_c on an IA64 platform (definitely
little-endian) that looks very similar to me.
It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
So, I wonder if that might be related.
-Paul
$ mpirun
Gilles's findings are consistent with mine which showed the SEGVs to be in
the coll/ml code.
I've built with --enable-debug and so below is a backtrace (well, two
actually) that might be helpful.
Unfortunately the output of the two ranks did get slightly entangled.
-Paul
$ ../INST/bin/mpirun
Paul and Ralph,
for what it's worth :
a) i faced the very same issue on my (slw) qemu emulated ppc64 vm
b) i was able to run very basic programs when passing --mca coll ^ml to
mpirun
Cheers,
Gilles
On 2014/08/01 12:30, Ralph Castain wrote:
> Yes, I fear this will require some effort to
/home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36:
error: use of undeclared identifier 'S_IRUSR'
fd = open(myfile, O_CREAT, S_IRUSR);
^
To fix this it was sufficient to add the following 3
Usual place - this is a last-chance check, so please hit it. Main change from
rc2 is the repairs to the Fortran binding config logic
http://www.open-mpi.org/software/ompi/v1.8/
Ralph
George's patch worked for me.
Now of course since this is a big-endian system things are still busted on
trunk, but ring_c is now hung instead of failing at load time.
-Paul
On Thu, Jul 31, 2014 at 9:30 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Paul,
>
> George just
Paul,
George just made a good point, you should test with his patch first
if it still does not work, could you try to mix gnu and sun compilers ?
configure ... CC=/usr/sfw/bin/gcc CXX=/usr/sfw/bin/g++ FC=
Cheers,
Gilles
On 2014/08/01 13:19, Paul Hargrove wrote:
> Gilles,
>
> This test was
Gilles,
This test was using the Solaris Studio Compilers version 12.3.
/usr/bin/gcc on this system is "gccfss" which Open MPI does NOT support.
There is also a gcc-3.3.2 in /usr/local/bin and gcc-3.4.3 in /usr/sfw/bin
Neither includes usable fortran compilers, which is why the Studio
compilers
I am afraid the suggestion on the mailing list only addressed half of the
problem. Indeed, alloca is used in two files (isend and irecv) while the
suggested patch only only fixed the one in isend.
George.
On Fri, Aug 1, 2014 at 12:12 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org>
Paul,
As Ralph pointed, this issue was reported last month on the user mailing
list.
#include did not help :
http://www.open-mpi.org/community/lists/users/2014/07/24883.php
I will try if i can reproduce and fix this issue on a solaris10 (but x86) VM
BTW, are you using the GNU compiler ?
In general I am only setup to build from tarballs, not svn.
However, I can (and will) apply this change manually w/o difficulty.
I will report back when I've had a chance to try that.
I already have many builds in-flight to test George's atomics patch and am
in danger of confusing myself if I am
Yes, I fear this will require some effort to chase all the breakage down given
that (to my knowledge, at least) we lack PPC machines in the devel group.
On Jul 31, 2014, at 5:46 PM, Paul Hargrove wrote:
> On the path to verifying George's atomics patch, I have started just
FWIW: we had Siegmar try that and it didn't solve the problem. Paul?
On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote:
> Author: bosilca (George Bosilca)
> Date: 2014-07-31 23:28:23 EDT (Thu, 31 Jul 2014)
> New Revision: 32388
> URL:
Anything you can suggest to resolve that problem would be most appreciated,
Paul. It's been reported before, but we have no idea what it is looking for.
On Jul 31, 2014, at 8:15 PM, Paul Hargrove wrote:
>
> $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
>
A missing include. Should be fixed by r32388.
Thanks,
George.
On Thu, Jul 31, 2014 at 11:15 PM, Paul Hargrove wrote:
>
> $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
> ld.so.1: ring_c: fatal: relocation error: file
>
$ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
ld.so.1: ring_c: fatal: relocation error: file
/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v8plus/INST/lib/openmpi/mca_pml_ob1.so:
symbol alloca: referenced symbol not found
This platform has worked in the past.
I will be
38 matches
Mail list logo