Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Ralph Castain
Excellent! Thanks Josh - both for the original work/commit and for the quick fix! Ralph On 5/6/08 3:58 PM, "Josh Hursey" wrote: > Sorry about that. Looking back at the filem logic it seems that I > returned success even if select failed (and just use the 'none' > passthrough component). I comm

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Josh Hursey
Sorry about that. Looking back at the filem logic it seems that I returned success even if select failed (and just use the 'none' passthrough component). I committed a patch in r18389 that fixes this problem. This commit now has a warning that prints on the filem verbose stream so if a us

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Ralph H Castain
Hmmmwell, I hit a problem (of course!). I have mca-no-build on the filem framework on my Mac. If I just mpriun -n 3 ./hello, I get the following error: -- It looks like orte_init failed for some reason; your parallel proce

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Josh Hursey
This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on

[OMPI devel] [RFC] mca_base_open() NULL

2008-05-06 Thread Josh Hursey
What: Add a MCA-NULL option to open no components in mca_base_open() Why: Sometimes we do not want to open or select any components of a framework. Where: patch attached for current trunk. When: Needs further discussion. Timeout: Unknown. [May 13, 2008 (After teleconf)?] Short Version: --

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Jeff Squyres
In addition to Steve's comments, we discussed this on the call today and decided that the patch is fine. Jon and I will discuss further because this is the first instance of calling some form of "disconnect" on one side causes events to occur on the other side without the involvement from t

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Steve Wise
Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happen

Re: [OMPI devel] NO IP address found

2008-05-06 Thread Jeff Squyres
I think the larger issue, though, is whether rdmacm will work properly for the LMC>0 case over IB, right? The fact that it shouldn't be displaying this error message now because RDMA CM is not the default is one issue, but it's not the *real* issue... On May 6, 2008, at 11:00 AM, Jon Mas

Re: [OMPI devel] NO IP address found

2008-05-06 Thread Jon Mason
On Tuesday 06 May 2008 09:41:53 am Jeff Squyres wrote: > I actually don't know what the RDMA CM requires for the LMC>0 case -- > does it require a unique IP address for every LID? It requires a unique IP address for every hca/port in use by rdmacm. I see the bug in rdmacm (since I don't believe

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Brian W. Barrett
On Tue, 6 May 2008, Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iW

Re: [OMPI devel] NO IP address found

2008-05-06 Thread Jeff Squyres
I actually don't know what the RDMA CM requires for the LMC>0 case -- does it require a unique IP address for every LID? On May 6, 2008, at 5:09 AM, Lenny Verkhovsky wrote: Hi, running BW benchmark with btl_openib_max_lmc >= 2 couses warning ( MPI from the TRUNK ) #mpirun --bynod

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Jeff Squyres
On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happening because the remote n

Re: [OMPI devel] Intel MPI Benchmark(IMB) using OpenMPI - Segmentation-fault error message.

2008-05-06 Thread Jeff Squyres
On May 1, 2008, at 10:43 AM, Lenny Verkhovsky wrote: (a) I did modify make_mpich makefile present in IMB-3.1/src folder giving the path for openmpi. Here I am using same mpirun as built from openmpi(v-1.2.5) also did mention in PATH & LD_LIBRARY_PATH. That should be fine. (b) What is the c

[OMPI devel] NO IP address found

2008-05-06 Thread Lenny Verkhovsky
Hi, running BW benchmark with btl_openib_max_lmc >= 2 couses warning ( MPI from the TRUNK ) #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 2 ./mpi_p_LMC -t bw -s 40 BW (40) (size min max avg) 40 321.493757 342.972837 329.493715 #mpirun