IIRC, the first 16 or so messages over the openib btl uses the send/recv
API as opposed to rdma which is significantly faster. I am not sure as
to how 1.5.3 and multi-rail affects this but the preconnected I believe
short circuits when one cuts over to use rdma for eager messages.
--td
On 10
Interestingly enough it worked for me for a while and then after many
runs I started seeing the below too.
--td
On 7/26/2012 11:07 AM, Ralph Castain wrote:
Hmmm...it was working for me, but I'll recheck. Thanks!
On Jul 26, 2012, at 8:04 AM, George Bosilca wrote:
r26868 seems to have some is
On 7/10/2012 1:57 PM, TERRY DONTJE wrote:
On 7/10/2012 12:50 PM, Jeff Squyres wrote:
I'm getting these compiler warnings on the SVN trunk HEAD (r26776):
btl_openib.c: In function 'mca_btl_openib_put':
btl_openib.c:1652: warning: assignment makes integer from pointer
On 7/10/2012 12:50 PM, Jeff Squyres wrote:
I'm getting these compiler warnings on the SVN trunk HEAD (r26776):
btl_openib.c: In function 'mca_btl_openib_put':
btl_openib.c:1652: warning: assignment makes integer from pointer without a cast
btl_openib.c: In function 'mca_btl_openib_get':
btl_op
On 7/5/2012 5:47 PM, Shamis, Pavel wrote:
I mentioned on the call that for Mellanox devices (+OFA verbs) this resource is
really cheap. Do you run mellanox hca + OFA verbs ?
(I'll reply because I know Terry is offline for the rest of the day)
Yes, he does.
I asked because SUN used to have o
With Jeff's latest changes to how we set up the cq_size I am now seeing
error messages saying that my machine's memlocked limits are too low. I
am concerned that it might be something else because my max'd locked
memory is unlimited on my machine.
So if I do a run of -np 2 across two separate
So is ofacm another replacement for ibcm and rdmacm?
--td
On 7/2/2012 11:20 AM, Nathan Hjelm wrote:
Nice! Are we moving this to 1.7 as well?
-Nathan
On Mon, Jul 02, 2012 at 11:20:12AM -0400, svn-commit-mai...@open-mpi.org wrote:
Author: pasha (Pavel Shamis)
Date: 2012-07-02 11:20:12 EDT (Mon
On 6/25/2012 10:12 AM, Jeff Squyres wrote:
On Jun 25, 2012, at 5:44 AM, TERRY DONTJE wrote:
Hmmm, I guess I could see the thinking of tying ofud and openib btls
configuring together. However it seems inconsistent to me that one btl doesn't
allow you to control configuring it in o
On 6/23/2012 6:32 AM, Jeff Squyres wrote:
On Jun 22, 2012, at 11:26 PM, TERRY DONTJE wrote:
4. The behavior of --with[out]-verbs is as was described in a prior mail:
- if --with-verbs is specified, all 3 verbs-based components must succeed
- if --without-verbs is specified, all 4 verbs
On 6/22/2012 3:36 PM, Jeff Squyres wrote:
To update everyone: there was much more discussion about this off-list. :-)
We decided to do the following:
1. The name --with-verbs seems better than --with-openfabrics, if for no other reason
than the name "openfabrics" encompasses more things tha
It looks like compilation of 32 bit platforms is failing due to a
missing field. It looks to me that for some reason r26626 deleted
hdr_segkey in ompi/mca/osc/rdma/osc_rdma_header.h which is used in the
macro OMPI_OSC_RDMA_RDMA_INFO_HDR_NTOH and HTON. Is there a reason that
hdr_segkey was rem
On 6/21/2012 8:52 AM, Jeff Squyres wrote:
On Jun 21, 2012, at 8:40 AM, TERRY DONTJE wrote:
So you specify --with-ofed and you get mca_btl_openib generated? ICK!!! I
think that will just make things more confusing. I am against this unless you
change the btl name.
We already have this
On 6/21/2012 6:38 AM, Jeff Squyres wrote:
On Jun 21, 2012, at 6:11 AM, TERRY DONTJE wrote:
As far as I understand it is not reason to rename it. The OFED-lovin components
should look at $with_openib.
I agree with Pasha that the reason you give for renaming openib btl seem
orthogonal to
On 6/20/2012 5:02 PM, Jeff Squyres wrote:
On Jun 20, 2012, at 4:25 PM, Shamis, Pavel wrote:
I hate it ...
As far as I understand it is not reason to rename it. The OFED-lovin components
should look at $with_openib.
I agree with Pasha that the reason you give for renaming openib btl seem
or
/18/2012 7:06 AM, TERRY DONTJE wrote:
I've ran into an issue compiling openib's Dynamic SL support on a RH
6.2 based system with the Oracle Studio compilers.
Turns out if I compile btl_openib_connect_sl.c with the Oracle Studio
compilers with the "-g" option the compiler
I've ran into an issue compiling openib's Dynamic SL support on a RH
6.2 based system with the Oracle Studio compilers.
Turns out if I compile btl_openib_connect_sl.c with the Oracle Studio
compilers with the "-g" option the compiler compiles some static inline
functions in ib_types.h standal
+#endif
+#endif
+
Void_t*
_int_malloc(mstate av, size_t bytes)
{
-- End of Patch --
On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:
I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv
in opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with
icc
+#pragma GCC optimization_level 1
+#endif
+#endif
+
Void_t*
_int_malloc(mstate av, size_t bytes)
{
-- End of Patch --
On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:
I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv
in opal_memory_ptmalloc2_int_free when OMPI
I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in
opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc
12.1.0 for 64 bit on linux. Just wondering if anyone has seen anything
similar to this with a different version of icc. Other non-Intel
compilers seem t
n.
I don't believe we are setting the env-var which is why I think we have
a regression. It also seems very suspicious to me that both Oracle and
IU are seeing the same condition in MTT. I'll look into this more on
Monday.
--td
On Apr 13, 2012, at 4:32 PM, TERRY DONTJE wrot
I could see if less then N processes exit with non-zero exit code that
the ORTE may choose not to abort the job. However, if all N processes
have exited or aborted I expect everything to clean up and mpirun to
exit. It does not do that at the moment which I think is what is
causing most of th
gister a malloc_hook. Anyways, per
my comment to 3071 I am
going to back out r26255.
--td
Brian
On 4/13/12 9:32 AM, "TERRY DONTJE" wrote:
I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to
key off of but this is really going to look gross ifdef&
re be other OSes that could run into the same problem? Or what
happens if PTMALLOC2 is not used (does that happen)?
--td
On 4/13/2012 10:45 AM, TERRY DONTJE wrote:
r26255 is forcing the use of __malloc_hook which is implemented in
opal/mca/memory/linux however that is not compiled in the li
r26255 is forcing the use of __malloc_hook which is implemented in
opal/mca/memory/linux however that is not compiled in the library when
built on Solaris thus causing a referenced symbol not found when libmpi
tries to load the openib btl.
I am looking how to fix this now but if someone has a
, 2012, at 4:44 AM, TERRY DONTJE wrote:
Thanks Ralph the comm_join issue seems to be fix but the other issues
mentioned still seem to persist. I'll look at this later today
unless someone else decides to fix them :-).
--td
On 4/9/2012 6:45 PM, Ralph Castain wrote:
Should all be fixe
Thanks Ralph the comm_join issue seems to be fix but the other issues
mentioned still seem to persist. I'll look at this later today unless
someone else decides to fix them :-).
--td
On 4/9/2012 6:45 PM, Ralph Castain wrote:
Should all be fixed now.
On Apr 9, 2012, at 7:17 AM, TERRY D
After looking at Oracles MTT results there seem to be a (some??)
regressions between r26240 and 26249 detected by the ibm and intel tests
suites. An example of this is the failures in the comm_join, final and
loop_spawn tests of the ibm test suite as seen in
http://www.open-mpi.org/mtt/index.p
+1 here too.
--td
On 4/6/2012 11:19 PM, Barrett, Brian W wrote:
Agreed.
Brian
On Apr 6, 2012, at 7:31 PM, Ralph Castain wrote:
+1 for SJ - much easier to be someplace with a major airport.
On Apr 5, 2012, at 7:54 AM, Gutierrez, Samuel K wrote:
My vote is for San Jose.
Sam
Have you tried to compile and run a simple MPI program with your
installed Open MPI? If that works then you need to figure out what is
being done by the Makefile when it is "testing if installed package can
be loaded" and try and reproduce the issue manually.
BTW, I normally configure my OMPI
entire Fortran bindings will be replaced in about 2 weeks, and the problem
doesn't occur on my mpi3-fortran bitbucket.
On Apr 5, 2012, at 7:03 AM, TERRY DONTJE wrote:
I noticed both IU and Oracle are seeing failures on the trunk with Intel test
MPI_Keyval3_f. This was with r26237 and the
I noticed both IU and Oracle are seeing failures on the trunk with Intel
test MPI_Keyval3_f. This was with r26237 and the last successful MTT
run of this test was r26232. I looked at the log and nothing popped out
at me. I'll try and narrow this down a little further but that won't be
until
Jeff was right in the recollection that this was mainly to test out that
accessing the fields in a structure was going to work in the debugger
plugin. If you remove some fields in ompi_win_t you can just remove the
corresponding GAP_CHECK line in the test. If you are removing fields in
the mi
Sorry the below cc line if for Solaris Studio compilers if you have gcc
replace "-G" with "-shared".
thanks,
--td
On 3/21/2012 11:32 AM, TERRY DONTJE wrote:
I ran into a problem on a Suse 10.1 system and was wondering if anyone
has a version of Suse newer than 10.1 that c
I ran into a problem on a Suse 10.1 system and was wondering if anyone
has a version of Suse newer than 10.1 that can try the following test
and send me the results.
-testpci
cat
On 2/22/2012 8:53 PM, Jeffrey Squyres wrote:
Terry / Eugene --
Can you comment?
Sorry I cannot.
--td
On Feb 22, 2012, at 3:16 PM, Paul H. Hargrove wrote:
I think I have the beginning of a fix for this issue.
I had not even noticed earlier that the error in event.h is from the C++ compil
I actually think the systems tested line for Solaris should read:
- Oracle Solaris 10 and 11, 32 and 64 bit (SPARC, i386, x86_64), with
Oracle Solaris Studio 12.2 and 12.3
--td
On 2/22/2012 8:55 PM, Paul H. Hargrove wrote:
Folks at Oracle should decide, but I suspect "Solaris 10" should be
On 2/21/2012 5:55 AM, Jeff Squyres (jsquyres) wrote:
That is truly bizarre "make" behavior.
Heads up that in the upcoming fortran revamp, we *only* use FC. I.E.,
there's only mpifort wrapper compiler (mpif77 and mpif90 still exist,
but only as sym links to mpifort, signifying that mpifort is
On 2/10/2012 11:50 AM, Jeff Squyres wrote:
This is an open question to OMPI developers...
It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen
is activated. This IP interface is only used to communicate with the local Xen
instance(s); it is not used to communicate
On 2/7/2012 9:57 PM, Paul H. Hargrove wrote:
On 2/7/2012 2:37 PM, Paul H. Hargrove wrote:
+ "make check" fails atomics tests using GCCFSS-4.0.4 compilers on
Solaris10/SPARC
Originally reported in:
http://www.open-mpi.org/community/lists/devel/2012/01/10234.php
This is a matter of the Sun/O
On 2/7/2012 4:25 PM, Paul H. Hargrove wrote:
On 2/7/2012 8:59 AM, Jeff Squyres wrote:
This fixes all known issues.
Well, not quite...
I've SUCCESSFULLY retested 44 out of the 55 cpu/os/compiler/abi
combinations currently on my list.
I expect 9 more by the end of the day (the older/slower
Good catch, for some reason the CMR of the patch I attached to ticket
#2977 didn't apply the CXX part. I've reopened the ticket asking Jeff
to apply that part of the patch :-).
Thanks,
--td
On 1/30/2012 6:17 PM, Paul H. Hargrove wrote:
I don't plan to rerun the dozens of different platforms
On 1/29/2012 7:40 PM, Paul Hargrove wrote:
I can additionally report success w/ ILP32 builds with both SS12.2 and
12.3 compilers on x86-64 and sun4v systems running Solaris and
x86-64/Linux:
solaris-10 Generic_137111-07/sun4v (*FLAGS="-m32 -xarch=sparc" for
v8plus ABI)
solaris-11 snv_1
This is awesome Paul, thanks a lot! I'll put in some verbage into the
README and submit a CMR.
--td
On 1/26/2012 2:49 AM, Paul H. Hargrove wrote:
I am pleased to report that w/ help from Terry I can now build nearly
everything w/ the Solaris Studio 12.2 and 12.3 compilers.
Upon comparing our
On 1/19/2012 5:22 PM, Paul H. Hargrove wrote:
Minor documentation nit, which might apply to the 1.5 branch as well
(didn't check).
README says:
- Open MPI does not support the Sparc v8 CPU target, which is the
default on Sun Solaris. The v8plus (32 bit) or v9 (64 bit)
targets must be us
Paul's probably more than likely right. The nightly runs Oracle does
using MTT and tarballs do not do autogen.sh (which I believe is not
expected anyways, right). All other builds we do using autogen.* are
from an svn workspace.
--td
On 12/20/2011 8:21 PM, Paul H. Hargrove wrote:
Not too b
time, we changed the default for RM-given
allocations to be no-oversubscribe. So your MTTs may well fail if they
weren't updated as all those tests oversubscribe the nodes, and are
running in RM environments.
On Dec 15, 2011, at 8:37 AM, TERRY DONTJE wrote:
Last night MTT test r
Last night MTT test results for 1.7a1r25652 from IU and Oracle is
showing failures during some of the spawn tests see
http://www.open-mpi.org/mtt/index.php?do_redir=2036.
Essentially, the test are failing with the message:
All nodes which are allocated for this job are already filled.
I wonde
On 11/23/2011 1:45 PM, Lukas Razik wrote:
TERRY DONTJE wrote
Can you build OMPI as a 32 bit library and see if that works any better?
So you mean I shall leave the whole OFED stack as 64 bit and build only openmpi
as 32 bit?
I believe the OFED user libraries will need to be 32 bit also or
On 11/23/2011 11:05 AM, Lukas Razik wrote:
TERRY DONTJE wrote:
Nuts!!! Ok I am going to have to think about this a little more. Do you have
the ability to configure and remake your ompi install? I might want to have you
add some stuff to help me track this down some more if you can
On 11/23/2011 10:11 AM, Lukas Razik wrote:
TERRY DONTJE wrote:
Can you try running the benchmark with coalescing off? To do that
add the following option to your mpirun line "-mca
btl_openib_use_message_coalescing 0".
I've tried this:
# /usr/mpi/gcc/openmpi-1.4.4
On 11/23/2011 9:57 AM, Lukas Razik wrote:
TERRY DONTJE wrote:
On 11/22/2011 6:59 PM, Lukas Razik wrote:
Roland Dreier wrote:
On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik
wrote:
#0 0xf8010229ba9c in mca_pml_ob1_send_request_start_copy
(sendreq=0xb23200, bml_btl=0xb29050, size=0
On 11/22/2011 6:59 PM, Lukas Razik wrote:
Roland Dreier wrote:
On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik wrote:
#0 0xf8010229ba9c in mca_pml_ob1_send_request_start_copy
(sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
551 hdr->hdr_match.hdr_ctx =
s
So with the aliasing scheme the code for openib would still under
ompi/mca/btl/openib but you could access it with -mca btl ofrc? Ok, so
when an error happens in the openib btl how does it identify itself?
Does it use openib or ofrc? This seems like there could be some user
confusion by adop
On 11/22/2011 5:49 AM, TERRY DONTJE wrote:
The error you are seeing is usually indicative of some code operating
on memory that isn't aligned properly for a SPARC instruction being
used. The address that is causing the failure is odd aligned which is
more than likely the culprit. I
The error you are seeing is usually indicative of some code operating on
memory that isn't aligned properly for a SPARC instruction being used.
The address that is causing the failure is odd aligned which is more
than likely the culprit. If you have a core dump and can disassemble
the code th
On 11/17/2011 9:54 AM, Ralph Castain wrote:
On Nov 17, 2011, at 7:45 AM, TERRY DONTJE wrote:
I could possibly buy your argument Ralph if this was a one off BTL
that only Nathan (and his employer) is going to use. I am assuming
though this is a more general protocol for a vendor specific
vote that counts is Nathan's - it's his btl, and we
have never forcibly made someone rename their component. I would
suggest we not set that precedent. I'm comfortable with whatever he
decides to call it.
On Nov 17, 2011, at 7:00 AM, TERRY DONTJE wrote:
+1
Isn't ther
+1
Isn't there precedent with the other BTLs to name them based on the
messaging protocol they are supporting instead of some movie character
(tcp, openib, shmem, portals, ...).
--td
On 11/17/2011 8:11 AM, Jeff Squyres wrote:
After having to explain to someone at SC for the umpteenth time t
On 11/15/2011 10:16 PM, Jeff Squyres wrote:
On Nov 14, 2011, at 10:17 PM, Eugene Loh wrote:
I tried building v1.5. r25469 builds for me, r25470 does not. This is
Friday's hwloc putback of CMR 2866. I'm on Solaris11/x86. The problem is
basically:
Doh!
Making all in tools/ompi_info
C
On 11/1/2011 7:48 PM, Jeff Squyres wrote:
So this was slightly different than the opinion that was discussed on the call
today, which was 2. The rationale for #2 was to punish developers, but if such
a bug did make it through to production, users wouldn't be annoyed with
show_help messages
Strange - it ran fine for me on multiple tests. I'll check to see if something
strange got into the mix and recommit.
Not sure it is the same issue but it looks like all my MTT tests on the
trunk r25308 are timing out.
--td
On Oct 17, 2011, at 8:51 PM, George Bosilca wrote:
This commit put
BTW, I am working on a patch for this. Just want to validate there are
no other loose ends. I remember there were a couple oddities about this
issue.
--td
Never mind; I just ready your text more carefully - 2887 caused the problem.
Sent from my phone. No type good.
On Oct 18, 2011, at 6:19
Terry -
Did #2887 fix this already?
No it broke it.
--td
Sent from my phone. No type good.
On Oct 18, 2011, at 6:19 AM, "Open MPI" wrote:
#2888: base.h inclusion breaks Solaris build
+
Reporter: tdd | Owner: tdd
Type: defect
to see all help / error messages
As you can see it is identical to the output in your test.
george.
On Aug 18, 2011, at 12:29 , TERRY DONTJE wrote:
Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same
except I don't see the "readv failed.." me
george.
On Aug 18, 2011, at 12:29 , TERRY DONTJE wrote:
Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same
except I don't see the "readv failed.." message.
Have your tried to run this code yourself? It is pretty simple and fails with
one n
Just ran MPI_Errhandler_fatal_c with r25063 and it still fails.
Everything is the same except I don't see the "readv failed.." message.
Have your tried to run this code yourself? It is pretty simple and
fails with one node using np=4.
--td
On 8/18/2011 10:57 AM, Wesley Bland wrote:
I just
I am seeing the intel test suite tests MPI_Errhandler_fatal_c and
MPI_Errhandler_fatal_f fail with an oob failure quite a bit I have not
seen this test failing under MTT until the epoch code was added. So I
have a suspicion the epoch code might be at fault. Could someone
familiar with the ep
with Oracle IB people.
Other question is do Oracle folks care about IB QoS and torus/mesh
topologies w.r.t. OMPI, because otherwise the dynamic SL is irrelevant.
It is not an extreme priority of ours but we would like to support it.
--td
-- YK
On Jul 14, 2011, at 7:24 AM, Terry Dontje wrote
On 7/14/2011 9:30 AM, Yevgeny Kliteynik wrote:
On 14-Jul-11 4:21 PM, Paul H. Hargrove wrote:
On 7/13/2011 11:42 PM, Yevgeny Kliteynik wrote:
[adding Terry]
On 14-Jul-11 2:49 AM, Eugene Loh wrote:
On 7/13/2011 4:31 PM, Paul H. Hargrove wrote:
On 7/13/2011 4:20 PM, Yevgeny Kliteynik wrote:
I do but my machine room's power is down so I don't have access to it
right now. I will grope around once it comes up to see what it has. I
also have sent email to our IB team for some direction.
--td
On 7/14/2011 2:42 AM, Yevgeny Kliteynik wrote:
[adding Terry]
On 14-Jul-11 2:49 AM, Eugen
Trying to uplevel this a bit so I can figure out which of these paths
makes sense to me. Is the only reason we want to convert the symmetry
of init and finalize to being asymmetric is to support an abort case?
Forgive me Ralph, I know you had posted this in one of the emails but I
wanted to
, Jeff Squyres wrote:
On Mar 16, 2011, at 6:50 AM, Terry Dontje wrote:
K. When Ralph and I removed that code, it was on he educated guess that no one
was using it (because it hasn't compiled right in a while). If we were wrong,
it can be put back, but someone will need to update it and Ral
On 03/16/2011 06:34 AM, Terry Dontje wrote:
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look
athttp://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes
ar 16, 2011, at 6:32 AM, "Terry Dontje" <mailto:terry.don...@oracle.com>> wrote:
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look
athttp://www.ope
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look at
http://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes I did, and I will be talking with my group
Note things have been building up until
now.
--td
I am not seeing this with vanilla RHEL5.
On Mar 15, 2011, at 1:39 PM, Terry Dontje wrote:
While compiling btl_openib_connect_oob.c I am getting identifier redeclared:
ib_gid_t. Looks like infiniband/mad.h defines this and then iba/types.h
It looks to me like r24507 is what changed in btl_openib_connect_oob.c
to include the two header files that are conflicting with each other.
--td
On 03/15/2011 01:39 PM, Terry Dontje wrote:
While compiling btl_openib_connect_oob.c I am getting identifier
redeclared: ib_gid_t. Looks like
While compiling btl_openib_connect_oob.c I am getting identifier
redeclared: ib_gid_t. Looks like infiniband/mad.h defines this and then
iba/types.h tries to redefine it.
I am on Linux compiling with gcc. Is anyone else seeing the same issue
or am I possibly dealing with some old s/w?
--
Or
Hopefully we'll find out tomorrow but I think I vaguely remember an
issue with the Studio compilers and this type of initialization style.
--td
On 01/19/2011 05:22 PM, Nathan Hjelm wrote:
Done. I added the module orte/mca/debugger/dummy and I will remove it
tomorrow.
-Nathan
HPC-3, LANL
On
On 01/18/2011 07:48 AM, Jeff Squyres wrote:
> IBCM is broken and disabled (has been for a long time).
>
> Did you mean RDMACM?
>
>
No I think I meant OMPI oob.
sorry,
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Techn
Could the issue have anything to do with the how OMPI implements lazy
connections with IBCM? Does setting the mca parameter
mpi_preconnect_all to 1 change things?
--td
On 01/16/2011 04:12 AM, Doron Shoham wrote:
Hi,
The gather hangs only in liner_sync algorithm but works with
basic_linear a
After further inspection I saw that events is being set to POLLIN only.
Is that suppose to mask out any other bits from being set (like
POLLRDNORM)?
--td
On 12/21/2010 10:35 AM, Terry Dontje wrote:
We're doing some testing with openib btl on a system with Solaris. It
looks like Solari
We're doing some testing with openib btl on a system with Solaris. It
looks like Solaris can return POLLIN|POLLRDNORM in revents from a poll
call. I looked at the manpages for Linux and it reads like Linux could
possibly do this too. However the code in btl_openib_async_thread that
checks fo
ov 30, 2010, at 9:36 AM, Terry Dontje wrote:
On 11/30/2010 09:00 AM, Jeff Squyres wrote:
On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:
Can you make a v1.7 milestone on Trac, so I can move some of my tickets?
Done.
I have a question about Josh's recent ticket moves. One of them mention
On 11/30/2010 09:00 AM, Jeff Squyres wrote:
On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:
Can you make a v1.7 milestone on Trac, so I can move some of my tickets?
Done.
I have a question about Josh's recent ticket moves. One of them
mentions 1.5 is stablizing quickly Josh can you clarify
A few comments:
1. Have you guys considered using hwloc for level 4-7 detection?
2. Is L2 related to L2 cache? If no then is there some other term you
could use?
3. What do you see if the process is bound to multiple cores/hyperthreads?
4. What do you see if the process is not bound to any
Hmmm, it looks like you are right so my original change probably is the
right thing then.
--td
On 11/08/2010 08:13 AM, Jeff Squyres wrote:
It doesn't look like is needed at all in libevent207.h. Should it
just be removed?
On Nov 8, 2010, at 6:18 AM, Terry Dontje wrote:
In light o
In light of the push event changes upstream to libevent the changes to
libevent207.h probably should be modified to look like event.h. That is
wrap the include with some ifdef for C++. I did not do this
in the original fix because everything pulling it in was also pulling in
opal_config.h an
On 10/11/2010 06:11 AM, Jeff Squyres wrote:
On Oct 10, 2010, at 7:49 AM, Terry Dontje wrote:
At first glance this sounds like a sane approach but didn't we start with this
same approach with 1.5.0? I know it was kind of required to do it for 1.5.0
but we did go off track with delivery
At first glance this sounds like a sane approach but didn't we start
with this same approach with 1.5.0? I know it was kind of required to
do it for 1.5.0 but we did go off track with delivery. I believe to be
successful at making a deadline for 1.5.1 we need to consider the
following. Do w
ich actually
has a Linux version).
--td
Steve Wise wrote:
Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So
its not the algorithm in and of itself, but rather some interplay
between the algorithm and connection setup I guess.
On 9/17/2010 5:24 AM, Terry Dontje wrote:
Does
Does setting mca parameter mpi_preconnect_mpi to 1 help at all. This
might be able to help determine if it is the actually connection set up
between processes that are out of sync as oppose to something in the
actual gather algorithm.
--td
Steve Wise wrote:
Here's a clue: ompi_coll_tuned_ga
Sorry Rich, I didn't realize there was a graph attached at the end of
message. In other words my comments are not applicable because I really
didn't know you were asking about the graph. I agree it would be nice
to know what the graph was plotting.
--td
Terry Dontje wrote:
Graha
, "Samuel K. Gutierrez" wrote:
Hi Terry,
On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote:
I've done some minor testing on Linux looking at resident and shared memory
sizes for np=4, 8 and 16 jobs. I could not see any appreciable differences in
sizes in the proce
Terry,
On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote:
I've done some minor testing on Linux looking at resident and shared
memory sizes for np=4, 8 and 16 jobs. I could not see any
appreciable differences in sizes in the process between sysv, posix
or mmap usage in the SM btl.
So
Samuel K. Gutierrez wrote:
If I'm not mistaken, the warning is only issued if the backing files
is stored on the following file systems: Lustre, NFS, Panasas, and
GPFS (see: opal_path_nfs in opal/util/path.c). Based on the
performance numbers that Sylvain provided on June 9th of this year
I've done some minor testing on Linux looking at resident and shared
memory sizes for np=4, 8 and 16 jobs. I could not see any appreciable
differences in sizes in the process between sysv, posix or mmap usage in
the SM btl.
So I am still somewhat non-plussed about making this the default. It
Graham, Richard L. wrote:
Why do we need an RFC for this sort of component ? Seems self contained.
Probably don't, just giving a heads up.
--td
Rich
On 8/3/10 6:59 AM, "Terry Dontje" wrote:
WHAT: Add new Solaris sysinfo component
WHY: To allow OPAL access to chip
WHAT: Add new Solaris sysinfo component
WHY: To allow OPAL access to chip type and model information when
running on Solaris OS.
WHERE: opal/mca/sysinfo/solaris
WHEN: for 1.5.1
TIMEOUT: Aug 10, 2010
-
MORE DETAILS:
Jeff Squyres wrote:
Just chatted with Ralph about this on the phone and he came up with a slightly
better compromise...
He points out that we really don't need *all* of the hwloc API (there's a
bajillion tiny little accessor functions). We could provide a steady,
OPAL/ORTE/OMPI-specific API
1 - 100 of 285 matches
Mail list logo