Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-07-14 Thread Bryan Lally
Developers, I was about to test 1.3.3rc2, then I saw that 1.3.3 had also escaped. I tried it, and voila! It solves the issue I reported in May, below. Thanks for all the work that went into this. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laborato

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-19 Thread Bryan Lally
Bryan Lally wrote: Here's what we've found. It wasn't the platform file as such. I've since built with ./configure and some standard, obvious command line switches. What's then required is to edit the platform configuration file, /etc/openmpi-mca-params.conf and add: coll_sync_priority =

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-19 Thread Bryan Lally
Jeff Squyres wrote: On May 18, 2009, at 11:49 PM, Bryan Lally wrote: Ralph sent me a platform file and a corresponding .conf file. I built ompi from openmpi-1.3.3a1r21223.tar.gz, with these files. I've been running my normal tests and have been unable to hang a job yet. I've run enough that

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-19 Thread Jeff Squyres
On May 18, 2009, at 11:49 PM, Bryan Lally wrote: Ralph sent me a platform file and a corresponding .conf file. I built ompi from openmpi-1.3.3a1r21223.tar.gz, with these files. I've been running my normal tests and have been unable to hang a job yet. I've run enough that I don't expect to see

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-18 Thread Bryan Lally
Eugene Loh wrote: Ralph Castain wrote: Hi Bryan I have seen similar issues on LANL clusters when message sizes were fairly large. How big are your buffers when you call Allreduce? Can you send us your Allreduce call params (e.g., the reduce operation, datatype, num elements)? If you do

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-18 Thread Bryan Lally
Eugene Loh wrote: Ralph Castain wrote: Hi Bryan I have seen similar issues on LANL clusters when message sizes were fairly large. How big are your buffers when you call Allreduce? Can you send us your Allreduce call params (e.g., the reduce operation, datatype, num elements)? If you do

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-18 Thread Eugene Loh
Ralph Castain wrote: Hi Bryan I have seen similar issues on LANL clusters when message sizes were fairly large. How big are your buffers when you call Allreduce? Can you send us your Allreduce call params (e.g., the reduce operation, datatype, num elements)? If you don't want to send th

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-11 Thread Ralph Castain
Hi Bryan I have seen similar issues on LANL clusters when message sizes were fairly large. How big are your buffers when you call Allreduce? Can you send us your Allreduce call params (e.g., the reduce operation, datatype, num elements)? If you don't want to send that to the list, you can

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-11 Thread Bryan Lally
Eugene Loh wrote: Another user reports something somewhat similar at http://www.open-mpi.org/community/lists/users/2009/04/9154.php . That problem seems to be associated with GCC 4.4.0. What compiler are you using? OMPI was built with gcc v4.3.0, which is what's packaged with Fedora 9. T

Re: [OMPI devel] possible bug in 1.3.2 sm transport

2009-05-11 Thread Eugene Loh
Bryan Lally wrote: I think I've run across a race condition in your latest release. Since my demonstrator is somewhat large and cumbersome, I'd like to know if you already know about this issue before we start the process of providing code and details. Basics: openmpi 1.3.2, Fedora 9, 2 x86

[OMPI devel] possible bug in 1.3.2 sm transport

2009-05-11 Thread Bryan Lally
Developers, This is my first post to the openmpi developers list. I think I've run across a race condition in your latest release. Since my demonstrator is somewhat large and cumbersome, I'd like to know if you already know about this issue before we start the process of providing code and d