Re: [OMPI devel] SM backing file size

2008-11-12 Thread Pak Lui
d for controlling file size has morphed into something I don't recognize. Can someone more familiar with that subsystem point me to one or more params that will allow us to control the size of that file? It is swamping our systems and causing OMPI to segfault. Thanks Ralph -- Regards, -

Re: [OMPI devel] get_iwarp_subnet_id in openib btl

2008-05-21 Thread Pak Lui
Yup. It works. Thanks! With r18470 it works even better! Jon Mason wrote: On Tue, May 20, 2008 at 03:44:41PM -0400, Pak Lui wrote: Hi Jon, This is CentOS 4.6 on Ranger. Sorry I didn't mention it. So what should I do? login3% more /etc/*release* :: /etc/redhat-re

Re: [OMPI devel] get_iwarp_subnet_id in openib btl

2008-05-20 Thread Pak Lui
Mason wrote: On Tue, May 20, 2008 at 02:48:49PM -0400, Pak Lui wrote: Hi, I am not familiar with get_iwarp_subnet_id and I am not sure why it is causing trunk to barf. I think I am using ofed 1.2.5. See attached for That is in the 1.3 tree, not 1.2. There was a bug in Solaris that was

[OMPI devel] get_iwarp_subnet_id in openib btl

2008-05-20 Thread Pak Lui
444 make[1]: Leaving directory `/work/00951/paklui/ompi-trunk7/config-data1/ompi' 10445 make: *** [install-recursive] Error 1 "make.install.log.0" 10445L, 2050037C 10445,1 Bot -- - Pak Lui pak@sun.com config.log.bz2 Description: application/bzip

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-08 Thread Pak Lui
Thanks very much Josh! Will try it out soon. Josh Hursey wrote: Sorry about that. I didn't test that type of option. It should be working in r18418. Let me know if you see any more issues. -- Josh On May 8, 2008, at 6:04 PM, Pak Lui wrote: I think I have a problem but I am not sure. I

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-08 Thread Pak Lui
__ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui
49200 | 49201 | int 49202 | main () 49203 | { 49204 | void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0); 49205 | ; 49206 | return 0; 49207 | } 49208 configure:123650: result: no Pak Lui wrote: For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui
suggestion to replace OMPI_COMPILE_IFELSE to OMPI_LINK_IFELSE. Will let you know. Pak Lui wrote: Jeff Squyres wrote: Jon / Steve -- can you comment? I tested with OFED 1.2.5 (which is what I assume you meant) and got: checking for rdma_get_peer_addr... no Because that function is not defin

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui
e the AC_COMPILE_IFELSE in config/ ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as to why it would compile successfully if the symbol rdma_get_peer_addr is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5, AFAIK)... On May 3, 2008, at 10:56 AM, Pa

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-03 Thread Pak Lui
happened during configure. On May 2, 2008, at 7:09 PM, Pak Lui wrote: Hi Jeff, It seems that the cpc3 merge causes my Ranger build to break. I believe it is using OFED 1.2 but I don't know how to check. It passes the ompi_check_openib.m4 that you added in for the rdma_get_p

Re: [OMPI devel] Mercurial demo OMPI repository

2008-04-29 Thread Pak Lui
_ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui
ed in this regard for several years. On 4/8/08 11:36 AM, "Pak Lui" wrote: First, can your user executable create a signal handler to catch the SIGUSR2 to not exit? By default on Solaris it is going to exit, unless you catch the signal and have the process to do nothing. from signal(

Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui
http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Pak Lui
s Best Regards, Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/dev

Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Pak Lui
iling list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

[OMPI devel] cpu stride and offset for processor binding

2008-02-05 Thread Pak Lui
7;s coming down the pipe that may allow me to specify cpuids in a sequence, or we already have some feature like that that I didn't know about? I look around but I don't see anything like this. Thanks in advance for any comments. -- - Pak Lui pak@sun.com

Re: [OMPI devel] processor affinity -- OpenMPI/batch system integration

2008-01-11 Thread Pak Lui
___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16200

2007-09-25 Thread Pak Lui
* Comm_size */ charname[64];/* the name if it has one */ } mqs_communicator; ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- - Pak Lui pak@sun.com

Re: [OMPI devel] Message Queue debugging support for1.2.4

2007-09-19 Thread Pak Lui
tinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-25 Thread Pak Lui
sad...@gmx.net wrote: Pak Lui schrieb: sad...@gmx.net wrote: Sorry for late reply, but I havent had access to the machine at the weekend. I don't really know what this means. People have explained "loose" vs. "tight" integration to me before, but since I

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-25 Thread Pak Lui
rt time and also also solve the privileged socket limitation for launching parallel jobs. It will be in the upcoming release. -- - Pak Lui pak@sun.com

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-25 Thread Pak Lui
ide sge_execd not setting the limits correctly. thanks for your great help :) ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] (loose) SGE Integration fails, why?

2007-06-22 Thread Pak Lui
The workaround was to put the setting into the ~/.tcshrc. So if SGE is not setting other resource limit correctly or doesn't provide the option, you may have to workaround into the ~/.tcshrc or simliar settings file for your shell. Otherwise it'll probably fall back to use the system default. -- - Pak Lui pak@sun.com

Re: [OMPI devel] large virtual memory consumption on smp nodes and gridengine problems

2007-06-11 Thread Pak Lui
-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com

Re: [OMPI devel] [GE users] OpenMPI 1.2 integration and dedicated MPI networks

2006-10-20 Thread Pak Lui
r the gigE interface. See if that works instead. -mca btl_tcp_if_exclude lo,all_my_non_gigE_if Orion Poplawski wrote: Pak Lui wrote: Hi Orion and Reuti, Let me see if I can understand the issue by breaking them down first: (1) First, I am curious to know why you would need to create a PE_HOSTFIL

Re: [OMPI devel] [GE users] OpenMPI 1.2 integration and dedicated MPI networks

2006-10-20 Thread Pak Lui
", &tok); arch = strtok_r(NULL, " \n", &tok); ... node->node_name = strdup(ptr); node->node_arch = strdup(arch); Perhaps it can be modified it uses the queue name hostname when doing SGE/qrsh calls, but the first hostname when doing MPI communication. Not really sure what the intent of the two fields in SGE's pe_hostfile is, or if OpenMPI can handle the idea of two hostnames for different purposes. -- Thanks, - Pak Lui pak@sun.com

Re: [OMPI devel] MPI2 Client-Server routines OpenMPI BUG!!!

2006-09-07 Thread Pak Lui
n/listinfo.cgi/devel -- Thanks, - Pak Lui pak@sun.com

Re: [OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-02 Thread Pak Lui
ave to look at it, but this may not really be feasible. Ralph Jeff Squyres (jsquyres) wrote: The main reason that it doesn't work is because we didn't do any thing to make it work. :-) Specifically, mpirun is not intercepting SIGSTOP and passing it on to the remote nodes

[OMPI devel] SIGSTOP and SIGCONT on orted

2006-06-01 Thread Pak Lui
t send the signal to orted on the remote node, but only to 'mpirun'. I am trying to see how to work around this. -- Thanks, - Pak Lui pak@sun.com

Re: [OMPI devel] MPI_Comm_spawn[_multiple] and orted

2006-05-31 Thread Pak Lui
just how big an issue this is, relative release dates, other commitments, etc. etc. Ralph Pak Lui wrote: Ralph Castain wrote: First, the fact that an orted already exists on a node is not sufficient to allow us to use it again for another application. The orted must be persistent or else we d

Re: [OMPI devel] MPI_Comm_spawn[_multiple] and orted

2006-05-31 Thread Pak Lui
We may be able to resolve this in a fairly straightforward manner - I think a lot of the necessary tools are already in the system, we just need to "hook them up" appropriately for SGE. yup, that's the goal. Ralph Pak Lui wrote: Hi, When I run a spawn program over

Re: [OMPI devel] MPI_Comm_spawn[_multiple] and orted

2006-05-31 Thread Pak Lui
d for this limitation due to SGE slots. We could try to track and set some top limit for the number of times that qrsh can exec, before the spawn program uses up all the available SGE slots and errors out. Ralph Pak Lui wrote: Hi, When I run a spawn program over rsh/ssh, I notice that e

[OMPI devel] MPI_Comm_spawn[_multiple] and orted

2006-05-31 Thread Pak Lui
ound I can see aren't pretty though. So I welcome your questions, comments or suggestions on this. -- Thanks, - Pak Lui pak@sun.com