[OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny.

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
Is this related to r1378? On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote: Is this related to r1378? Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket. On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny.

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
I believe it it. On 7/28/08, Jeff Squyres wrote: > > On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote: > > Is this related to r1378? >> > > Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket. > > > On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: >> >> Hi, >>> >>> I experience hang

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
It could also be something new. Brad and I noted on Fri that IB was locking up as soon as we tried any cross-node communications. Hadn't seen that before, and at least I haven't explored it further - planned to do so today. On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote: I believe i

[OMPI devel] Funny warning message

2008-07-28 Thread Ralph Castain
Just got this warning today while trying to test IB connections. Last I checked, 32 was indeed smaller than 192... -- WARNING: rd_win specification is non optimal. For maximum performance it is advisable to configure rd_w

[OMPI devel] 1.3 build failing on MacOSX

2008-07-28 Thread Greg Watson
I'm getting the following when I try and build 1.3 from SVN: gcc -DHAVE_CONFIG_H -I. -I../../adio/include -DOMPI_BUILDING=1 -I/ Users/greg/Documents/workspaces/ptp_head/ompi/ompi/mca/io/romio/ romio/../../../../.. -I/Users/greg/Documents/workspaces/ptp_head/ompi/ ompi/mca/io/romio/romio/../..

Re: [OMPI devel] 1.3 build failing on MacOSX

2008-07-28 Thread Jeff Squyres
Blast. Looks like a problem with the new ROMIO I brought in last week. I'll fix shortly; thanks for the heads-up. On Jul 28, 2008, at 9:36 AM, Greg Watson wrote: I'm getting the following when I try and build 1.3 from SVN: gcc -DHAVE_CONFIG_H -I. -I../../adio/include -DOMPI_BUILDING=1 -I/

[OMPI devel] MCA base changes

2008-07-28 Thread Jeff Squyres
With the update on #1400, I think we're ready to push the MCA base changes to the SVN trunk. Speak now if you object, or forever hold your peace. The most notable parts of this commit: - add "register" function to mca_base_component_t - converted coll:basic and paffinity:linux and paffini

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
I checked this out some more and I believe it is ticket #1378 related. We lock up if SM is included in the BTL's, which is what I had done on my test. If I ^sm, I can run fine. On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote: It could also be something new. Brad and I noted on Fri that IB

Re: [OMPI devel] Funny warning message

2008-07-28 Thread Lenny Verkhovsky
It seems that the error felt into the helpfile. Index: ompi/mca/btl/openib/help-mpi-btl-openib.txt === --- ompi/mca/btl/openib/help-mpi-btl-openib.txt (revision 19054) +++ ompi/mca/btl/openib/help-mpi-btl-openib.txt (working copy) @@

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
I failed to run on different nodes or on the same node via self,openib On 7/28/08, Ralph Castain wrote: > > I checked this out some more and I believe it is ticket #1378 related. We > lock up if SM is included in the BTL's, which is what I had done on my test. > If I ^sm, I can run fine. > > On

Re: [OMPI devel] Funny warning message

2008-07-28 Thread Adrian Knoth
On Mon, Jul 28, 2008 at 05:14:29PM +0300, Lenny Verkhovsky wrote: > -advisable to configure rd_win smaller then (rd_num - rd_low), but currently > +advisable to configure rd_win bigger then (rd_num - rd_low), but currently ^ a -- Cluster and Metacomputi

[OMPI devel] RFC: MCA DSO filename

2008-07-28 Thread Jeff Squyres
WHAT: Rename MCA DSO filenames from "mca__.so" to "libmca__.so" (backwards compatibility can be preserved if we want it; see below) WHY: Allows simplifying component Makefile.am's WHEN: No real rush; just wanted to get the idea out there (does *not* need to be before v1.3; more explanation

Re: [OMPI devel] Funny warning message

2008-07-28 Thread Jeff Squyres
I think Lenny is pointing out that "smaller" got changed to "bigger", too. :-) Looking at the test in the code (btl_openib_component.c): if ((rd_num - rd_low) > rd_win) { orte_show_help("help-mpi-btl-openib.txt", "non optimal rd_win", tru

Re: [OMPI devel] Funny warning message

2008-07-28 Thread Ralph Castain
On Jul 28, 2008, at 8:22 AM, Jeff Squyres wrote: I think Lenny is pointing out that "smaller" got changed to "bigger", too. :-) Looking at the test in the code (btl_openib_component.c): if ((rd_num - rd_low) > rd_win) { orte_show_help("help-mpi-btl-openib.txt", "n

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Brad Benton
My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tests using --mca btl openib,self hang in all cases. --brad 2008/7/28 Lenny Verkhovsky > I failed to run on different nodes or on the same node via self,openib > > > > On 7/28/08, Ralph Castain wrote: >> >> I c

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
Interesting - you are quite correct and I should have been more precise. I ran with -mca btl openib and it worked. So having just openib seems to be okay. On Jul 28, 2008, at 8:37 AM, Brad Benton wrote: My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tes

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
FWIW, all my MTT runs are hanging as well. On Jul 28, 2008, at 10:37 AM, Brad Benton wrote: My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tests using --mca btl openib,self hang in all cases. --brad 2008/7/28 Lenny Verkhovsky I failed to run on diffe

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. On 7/28/08, Jeff Squyres wrote: > > FWIW, all my MTT runs are hanging as well. > > > On Jul 28, 2008, at 10:37 AM, Brad Benton wrote: > > My experience is the same a Lenny's. I've teste

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote: only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. Don't know - could be true. But if that is true, then we should check to see if that condition is met and error out - with a

Re: [OMPI devel] 1.3 build failing on MacOSX

2008-07-28 Thread Jeff Squyres
Looking into it a bit more, the situation is a little convoluted. I've filed https://svn.open-mpi.org/trac/ompi/ticket/1419; followups will occur there. On Jul 28, 2008, at 9:42 AM, Jeff Squyres wrote: Blast. Looks like a problem with the new ROMIO I brought in last week. I'll fix sho

[OMPI devel] Change in slot_list specification

2008-07-28 Thread Ralph Castain
Just an FYI for those of you working with slot_lists. Lenny, Jeff and I have changed the mca param associated with how you specify the slot list you want the rank_file mapper to use. This was done to avoid the possibility of ORTE processes such as mpirun and orted accidentally binding thems

[OMPI devel] Change in hostfile behavior

2008-07-28 Thread Ralph Castain
Per an earlier telecon, I have modified the hostfile behavior slightly to allow hostfiles to subdivide allocations. Briefly: given an allocation, we allow users to specify --hostfile on a per-app_context basis. In this mode, the hostfile info is used to filter the nodes that will be used fo

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread George Bosilca
I'm a little bit lost here. You're stating that openib,self doesn't work while openib does? In other words that adding self to the BTL leads to deadlocks? george. PS: Btw, it is not supposed to work at all, except in the case where openib handle internal messages (where the source and de

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
I just re-tested to confirm, and that is correct. -mca btl openib works -mca btl openib,selfhangs -mca btl openib,sm works On Jul 28, 2008, at 9:49 AM, George Bosilca wrote: I'm a little bit lost here. You're stating that openib,self doesn't w

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread George Bosilca
Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening. Thanks, george. On Jul 28, 2008, at 5:56 PM, Ralph Castain wro

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening. FWIW, my manual

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
My test wasn't a benchmark - I was just testing with a little program that calls mpi_init, mpi_barrier, and mpi_finalize. A test with just mpi_init/finalize works fine, so it looks like we simply hang when trying to communicate. This also only happens on multi-node operations. On Jul 28,

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 11:05 AM, Ralph Castain wrote: only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. Don't know - could be true. But if that is true, then we should check to see if that condition is met and error out - with an

Re: [OMPI devel] Change in hostfile behavior

2008-07-28 Thread Tim Mattox
My only concern is how will this interact with PLPA. Say two Open MPI jobs each use "half" the cores (slots) on a particular node... how would they be able to bind themselves to a disjoint set of cores? I'm not asking you to solve this Ralph, I'm just pointing it out so we can maybe warn users th

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Terry Dontje
Jeff Squyres wrote: On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening.

Re: [OMPI devel] Change in hostfile behavior

2008-07-28 Thread Ralph Castain
Actually, this is true today regardless of this change. If two separate mpirun invocations share a node and attempt to use paffinity, they will conflict with each other. The problem isn't caused by the hostfile sub-allocation. The problem is that the two mpiruns have no knowledge of each ot

[OMPI devel] parallel debugger attach

2008-07-28 Thread Jeff Squyres
I think I fixed the parallel debugger attach stuff in an hg -- can interested parties test it out at their own sites before I bring it back to the SVN trunk? It should be working for both Allinea DDT and TotalView. HG: http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/debugger-stuff/ Ti

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Brad Benton
On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje wrote: > Jeff Squyres wrote: > >> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: >> >> Interesting. The self is only used for local communications. I don't >>> expect that any benchmark execute such communications, but apparently I was >>> wron

[OMPI devel] MCA_BTL_BASE_VERSION_1_0_1 and MCA_BTL_BASE_VERSION_1_0_0

2008-07-28 Thread Jeff Squyres
Since the trunk has now been bumped to MCA v2.0, and all frameworks have also been bumped to v2.0, are these two #defines relevant anymore: MCA_BTL_BASE_VERSION_1_0_1 MCA_BTL_BASE_VERSION_1_0_0 I know there was at least one BTL being developed at an organization that may not have kept up wit