Re: [OMPI devel] Trunk broken for PPC64?

2014-08-02 Thread Ralph Castain
Good suggestion, Paul - I have committed it in r32407 and added it to cmr #4826 Thanks! Ralph On Aug 1, 2014, at 1:12 AM, Paul Hargrove wrote: > Gilles, > > At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following: > > #ifdef HAVE_GETPAGESIZE > pagesize = getpagesize();

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
On Fri, Aug 1, 2014 at 1:19 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > i just commited r32393 (and made a CMR for v1.8) > > can you please give it a try ? > I am not equipped to build from svn on most of my test platforms. However, I applied your one-line change

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul, i just commited r32393 (and made a CMR for v1.8) can you please give it a try ? in the mean time, i received your email ... sysconf is called directly (e.g. no #ifdef protected) in several other places : $ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v autom4te |grep PA

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles, At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following: #ifdef HAVE_GETPAGESIZE pagesize = getpagesize(); #else pagesize = 4096; #endif While other places in the code use sysconf(), but not always consistently. And on some systems _SC_PAGESIZE is spelled

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul, you are absolutly right ! in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53, cm->lmngr_alignment is hard coded to 4096 as a proof of concept, i hard coded it to 65536 and now coll/ml works just fine i will now write a patch that uses sysconf(_SC_PAGESIZE) instead Cheers, Gilles On 2014/08

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Hmm, maybe this has nothing to do with big-endian. Below is a backtrace from ring_c on an IA64 platform (definitely little-endian) that looks very similar to me. It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. So, I wonder if that might be related. -Paul $ mpirun -mca

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles's findings are consistent with mine which showed the SEGVs to be in the coll/ml code. I've built with --enable-debug and so below is a backtrace (well, two actually) that might be helpful. Unfortunately the output of the two ranks did get slightly entangled. -Paul $ ../INST/bin/mpirun -mca

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph, for what it's worth : a) i faced the very same issue on my (slw) qemu emulated ppc64 vm b) i was able to run very basic programs when passing --mca coll ^ml to mpirun Cheers, Gilles On 2014/08/01 12:30, Ralph Castain wrote: > Yes, I fear this will require some effort to cha

Re: [OMPI devel] Trunk broken for PPC64?

2014-07-31 Thread Ralph Castain
Yes, I fear this will require some effort to chase all the breakage down given that (to my knowledge, at least) we lack PPC machines in the devel group. On Jul 31, 2014, at 5:46 PM, Paul Hargrove wrote: > On the path to verifying George's atomics patch, I have started just by > verifying that

[OMPI devel] Trunk broken for PPC64?

2014-07-31 Thread Paul Hargrove
On the path to verifying George's atomics patch, I have started just by verifying that I can still build the UNPATCHED trunk on each of the platforms I listed. I have tried two PPC64/Linux systems so far and am seeing the same problem on both. Though I can pass "make check" both platforms SEGV on

Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Ralph Castain
Works okay with a fresh checkout, so something in my tree must have been hosed. On Jul 25, 2014, at 8:51 AM, Ralph Castain wrote: > It seems to be only happening on my Mac, not Linux, but I'll try with a fresh > checkout > > On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres) > wrote: > >

Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Ralph Castain
It seems to be only happening on my Mac, not Linux, but I'll try with a fresh checkout On Jul 25, 2014, at 8:51 AM, Jeff Squyres (jsquyres) wrote: > I'm unable to replicate... perhaps you have a stale install tree? > > > On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote: > >> Hey folks >> >

Re: [OMPI devel] Trunk broken for --with-devel-headers?

2014-07-25 Thread Jeff Squyres (jsquyres)
I'm unable to replicate... perhaps you have a stale install tree? On Jul 24, 2014, at 6:36 PM, Ralph Castain wrote: > Hey folks > > Something in the last day or so appears to have broken the trunk's ability to > run --with-devel-headers. It looks like the headers are being installed > correc

[OMPI devel] Trunk broken for --with-devel-headers?

2014-07-24 Thread Ralph Castain
Hey folks Something in the last day or so appears to have broken the trunk's ability to run --with-devel-headers. It looks like the headers are being installed correctly, but mpicc fails to compile a program that uses them - the include passes, but the linker fails: Undefined symbols for archi

Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
Looks to me like the warning message saids it all - the problem is in openib. The reason we took this action was to force the problems to the surface across the code base so that people would address them. We've tried before to just ask people to set the right flags to enable async progress and fi

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
tried with vader - same crash *14:14:22* [vegas12:32068] 7 more processes have sent help message help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages*14:14:22* + LD_LIBRARY_PATH=/scrap/jenkins/scrap/works

Re: [OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
will do and update shortly. On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Mike, > > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang > (which later causes the timeout sending a SIGSEGV) > > Chee

Re: [OMPI devel] trunk broken

2014-06-25 Thread Gilles Gouaillardet
Mike, by the way, i pushed r32081. that might not be needed in your environment, but i get crash without it in mine. Cheers, Gilles On 2014/06/25 15:11, Gilles Gouaillardet wrote: > could you try again with > > OMPI_MCA_btl=vader,self,openib > > it seems the sm module causes a hang > (which lat

Re: [OMPI devel] trunk broken

2014-06-25 Thread Ralph Castain
We should have given more of a "heads up" here. We recognize that the trunk may well become unstable as we can't test all the variations, and clearly some timing issues are going to arise with this change. Our hope is that we can iron them out quickly. If not, then we'll revert and try again. You

Re: [OMPI devel] trunk broken

2014-06-25 Thread Gilles Gouaillardet
Mike, could you try again with OMPI_MCA_btl=vader,self,openib it seems the sm module causes a hang (which later causes the timeout sending a SIGSEGV) Cheers, Gilles On 2014/06/25 14:22, Mike Dubman wrote: > Hi, > The following commit broke trunk in jenkins: > Per the OMPI developer confe

[OMPI devel] trunk broken

2014-06-25 Thread Mike Dubman
Hi, The following commit broke trunk in jenkins: >>>Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS *22:15:09* + LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09* + OMPI_MCA_scoll_fca_enable=1*22

[OMPI devel] trunk - broken logic for oshmem:bindings:fort

2014-01-09 Thread Paul Hargrove
Building the trunk on FreeBSD-9/x86-64, and using gmake to work around the non-portable examples/Makefile, I *still* encounter issues with shmemfort when running "gmake" in the examples subdirectory: $ gmake mpicc -ghello_c.c -o hello_c mpicc -gring_c.c -o ring_c mpicc -gconnectivi

[OMPI devel] Trunk broken on NERSC's Cray XE6

2013-01-25 Thread Paul Hargrove
Following up as I promised... My results on NERSC's small Cray XE6 (the test/dev rack "Grace", rather than the full-sized "Hopper") match those I get on the Cray XC30 (Edison), and don't follow those Ralph reports for LANL's XE6. An attempt to build/link hello_c.c results in unresolved symbols fr

Re: [OMPI devel] trunk broken?

2012-08-30 Thread Ralph Castain
Yes, we know - been fixed. On Aug 30, 2012, at 7:50 AM, Eugene Loh wrote: > Trunk broken? Last night, Oracle's MTT trunk runs all came up empty handed. > E.g., > > *** Process received signal *** > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: (

[OMPI devel] trunk broken?

2012-08-30 Thread Eugene Loh
Trunk broken? Last night, Oracle's MTT trunk runs all came up empty handed. E.g., *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: (nil) [ 0] [0xe600] [ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3] [ 2] /lib/libc.so.6(__st

Re: [OMPI devel] Trunk broken?

2011-07-06 Thread Yevgeny Kliteynik
On 06-Jul-11 2:21 AM, Ralph Castain wrote: > Never mind - this seems to have been another svn-related artifact. I started > fresh and it didn't show up. I did some changes in m4 file, so I think that autogen + configure + make should have fixed the problem. But never mind, if it works with fresh

Re: [OMPI devel] Trunk broken?

2011-07-05 Thread Ralph Castain
Never mind - this seems to have been another svn-related artifact. I started fresh and it didn't show up. On Jul 5, 2011, at 12:46 PM, Ralph Castain wrote: > I'm getting this when trying to build the trunk on a system with openib: > > In file included from btl_openib_ini.h:16, >

[OMPI devel] Trunk broken?

2011-07-05 Thread Ralph Castain
I'm getting this when trying to build the trunk on a system with openib: In file included from btl_openib_ini.h:16, from btl_openib.c:47: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined In file included from btl_openib_component.c:80: btl_openib.h:219:6: warnin

Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Thanks George! It wouldn't compile for me on my Leopard or on any of our Linux clusters, nor on the IU odin Linux cluster. Not sure why - all were with different versions of gcc. Thanks again Ralph On Jan 28, 2009, at 2:48 PM, George Bosilca wrote: Seems more like a compiler problem. A st

Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread George Bosilca
Seems more like a compiler problem. A static inline function defined in the header file but never used is the source of the problem. It did compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll commit the fix asap. george. On Jan 28, 2009, at 14:26 , Ralph Castain wrote: Rat

Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Rats - once I fixed my area, it again broke on Linux at this same spot in convertor. Sorry for the confusion Ralph On Jan 28, 2009, at 12:25 PM, Ralph Castain wrote: Actually, check that - it seems to be building under Linux (my build broke in some other area where I am working, but not he

Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Actually, check that - it seems to be building under Linux (my build broke in some other area where I am working, but not here). However, it does not build on the Mac. Any suggestions? Ralph On Jan 28, 2009, at 12:19 PM, Ralph Castain wrote: Hi folks I believe a recent commit has broken t

[OMPI devel] Trunk broken at r20375

2009-01-28 Thread Ralph Castain
Hi folks I believe a recent commit has broken the trunk - I am unable to compile it on either Linux or Mac: In file included from convertor_raw.c:24: ../../ompi/datatype/datatype_pack.h: In function ‘pack_predefined_data’: ../../ompi/datatype/datatype_pack.h:41: error: implicit declaration of

Re: [OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph Castain
Just an update: I have fixed this problem. However, I will hold off checking it into the trunk until tomorrow. It will come in with the MPI-2 repairs to avoid code conflicts. Ralph > Since this appears to have gone unnoticed, it may not be a big deal. > However, I have found that multi-node oper

[OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph H Castain
Since this appears to have gone unnoticed, it may not be a big deal. However, I have found that multi-node operations are broken if you invoke the linear or direct routed modules. Things work fine with the default binomial routed module. I will be working to fix this - just a heads up. Ralph