Good suggestion, Paul - I have committed it in r32407 and added it to cmr #4826
Thanks!
Ralph
On Aug 1, 2014, at 1:12 AM, Paul Hargrove wrote:
> Gilles,
>
> At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following:
>
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
On Fri, Aug 1, 2014 at 1:19 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Paul,
>
> i just commited r32393 (and made a CMR for v1.8)
>
> can you please give it a try ?
>
I am not equipped to build from svn on most of my test platforms.
However, I applied your one-line change
Paul,
i just commited r32393 (and made a CMR for v1.8)
can you please give it a try ?
in the mean time, i received your email ...
sysconf is called directly (e.g. no #ifdef protected) in several other
places :
$ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v
autom4te |grep PA
Gilles,
At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following:
#ifdef HAVE_GETPAGESIZE
pagesize = getpagesize();
#else
pagesize = 4096;
#endif
While other places in the code use sysconf(), but not always consistently.
And on some systems _SC_PAGESIZE is spelled
Paul,
you are absolutly right !
in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53,
cm->lmngr_alignment is hard coded to 4096
as a proof of concept, i hard coded it to 65536 and now coll/ml works
just fine
i will now write a patch that uses sysconf(_SC_PAGESIZE) instead
Cheers,
Gilles
On 2014/08
Hmm, maybe this has nothing to do with big-endian.
Below is a backtrace from ring_c on an IA64 platform (definitely
little-endian) that looks very similar to me.
It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
So, I wonder if that might be related.
-Paul
$ mpirun -mca
Gilles's findings are consistent with mine which showed the SEGVs to be in
the coll/ml code.
I've built with --enable-debug and so below is a backtrace (well, two
actually) that might be helpful.
Unfortunately the output of the two ranks did get slightly entangled.
-Paul
$ ../INST/bin/mpirun -mca
Paul and Ralph,
for what it's worth :
a) i faced the very same issue on my (slw) qemu emulated ppc64 vm
b) i was able to run very basic programs when passing --mca coll ^ml to
mpirun
Cheers,
Gilles
On 2014/08/01 12:30, Ralph Castain wrote:
> Yes, I fear this will require some effort to cha
Yes, I fear this will require some effort to chase all the breakage down given
that (to my knowledge, at least) we lack PPC machines in the devel group.
On Jul 31, 2014, at 5:46 PM, Paul Hargrove wrote:
> On the path to verifying George's atomics patch, I have started just by
> verifying that
Works okay with a fresh checkout, so something in my tree must have been hosed.
On Jul 25, 2014, at 8:51 AM, Ralph Castain wrote:
> It seems to be only happening on my Mac, not Linux, but I'll try with a fresh
> checkout
>
> On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres)
> wrote:
>
>
It seems to be only happening on my Mac, not Linux, but I'll try with a fresh
checkout
On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres) wrote:
> I'm unable to replicate... perhaps you have a stale install tree?
>
>
> On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote:
>
>> Hey folks
>>
>
I'm unable to replicate... perhaps you have a stale install tree?
On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote:
> Hey folks
>
> Something in the last day or so appears to have broken the trunk's ability to
> run --with-devel-headers. It looks like the headers are being installed
> correc
Looks to me like the warning message saids it all - the problem is in
openib.
The reason we took this action was to force the problems to the surface
across the code base so that people would address them. We've tried before
to just ask people to set the right flags to enable async progress and fi
tried with vader - same crash
*14:14:22* [vegas12:32068] 7 more processes have sent help message
help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set
MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages*14:14:22* +
LD_LIBRARY_PATH=/scrap/jenkins/scrap/works
will do and update shortly.
On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which later causes the timeout sending a SIGSEGV)
>
> Chee
Mike,
by the way, i pushed r32081.
that might not be needed in your environment, but i get crash without it
in mine.
Cheers,
Gilles
On 2014/06/25 15:11, Gilles Gouaillardet wrote:
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which lat
We should have given more of a "heads up" here. We recognize that the trunk
may well become unstable as we can't test all the variations, and clearly
some timing issues are going to arise with this change. Our hope is that we
can iron them out quickly. If not, then we'll revert and try again.
You
Mike,
could you try again with
OMPI_MCA_btl=vader,self,openib
it seems the sm module causes a hang
(which later causes the timeout sending a SIGSEGV)
Cheers,
Gilles
On 2014/06/25 14:22, Mike Dubman wrote:
> Hi,
> The following commit broke trunk in jenkins:
>
Per the OMPI developer confe
Yes, we know - been fixed.
On Aug 30, 2012, at 7:50 AM, Eugene Loh wrote:
> Trunk broken? Last night, Oracle's MTT trunk runs all came up empty handed.
> E.g.,
>
> *** Process received signal ***
> Signal: Segmentation fault (11)
> Signal code: Address not mapped (1)
> Failing at address: (
On 06-Jul-11 2:21 AM, Ralph Castain wrote:
> Never mind - this seems to have been another svn-related artifact. I started
> fresh and it didn't show up.
I did some changes in m4 file, so I think that autogen + configure + make
should have fixed the problem. But never mind, if it works with fresh
Never mind - this seems to have been another svn-related artifact. I started
fresh and it didn't show up.
On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote:
> I'm getting this when trying to build the trunk on a system with openib:
>
> In file included from btl_openib_ini.h:16,
>
Thanks George!
It wouldn't compile for me on my Leopard or on any of our Linux
clusters, nor on the IU odin Linux cluster. Not sure why - all were
with different versions of gcc.
Thanks again
Ralph
On Jan 28, 2009, at 2:48 PM, George Bosilca wrote:
Seems more like a compiler problem. A st
Seems more like a compiler problem. A static inline function defined
in the header file but never used is the source of the problem. It did
compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll
commit the fix asap.
george.
On Jan 28, 2009, at 14:26 , Ralph Castain wrote:
Rat
Rats - once I fixed my area, it again broke on Linux at this same spot
in convertor.
Sorry for the confusion
Ralph
On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote:
Actually, check that - it seems to be building under Linux (my
build broke in some other area where I am working, but not he
Actually, check that - it seems to be building under Linux (my build
broke in some other area where I am working, but not here).
However, it does not build on the Mac.
Any suggestions?
Ralph
On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote:
Hi folks
I believe a recent commit has broken t
Just an update: I have fixed this problem. However, I will hold off checking
it into the trunk until tomorrow. It will come in with the MPI-2 repairs to
avoid code conflicts.
Ralph
> Since this appears to have gone unnoticed, it may not be a big deal.
> However, I have found that multi-node oper
26 matches
Mail list logo