d for controlling file size has morphed into
something I don't recognize.
Can someone more familiar with that subsystem point me to one or more
params that will allow us to control the size of that file? It is
swamping our systems and causing OMPI to segfault.
Thanks
Ralph
--
Regards,
-
Yup. It works. Thanks! With r18470 it works even better!
Jon Mason wrote:
On Tue, May 20, 2008 at 03:44:41PM -0400, Pak Lui wrote:
Hi Jon,
This is CentOS 4.6 on Ranger. Sorry I didn't mention it. So what should
I do?
login3% more /etc/*release*
::
/etc/redhat-re
Mason wrote:
On Tue, May 20, 2008 at 02:48:49PM -0400, Pak Lui wrote:
Hi,
I am not familiar with get_iwarp_subnet_id and I am not sure why it is
causing trunk to barf. I think I am using ofed 1.2.5. See attached for
That is in the 1.3 tree, not 1.2. There was a bug in Solaris that was
444 make[1]: Leaving directory
`/work/00951/paklui/ompi-trunk7/config-data1/ompi'
10445 make: *** [install-recursive] Error 1
"make.install.log.0" 10445L, 2050037C 10445,1
Bot
--
- Pak Lui
pak@sun.com
config.log.bz2
Description: application/bzip
Thanks very much Josh! Will try it out soon.
Josh Hursey wrote:
Sorry about that. I didn't test that type of option. It should be
working in r18418. Let me know if you see any more issues.
-- Josh
On May 8, 2008, at 6:04 PM, Pak Lui wrote:
I think I have a problem but I am not sure. I
__
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
49200 |
49201 | int
49202 | main ()
49203 | {
49204 | void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
49205 | ;
49206 | return 0;
49207 | }
49208 configure:123650: result: no
Pak Lui wrote:
For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be
suggestion to replace OMPI_COMPILE_IFELSE to
OMPI_LINK_IFELSE. Will let you know.
Pak Lui wrote:
Jeff Squyres wrote:
Jon / Steve -- can you comment?
I tested with OFED 1.2.5 (which is what I assume you meant) and got:
checking for rdma_get_peer_addr... no
Because that function is not defin
e the AC_COMPILE_IFELSE in config/
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as
to why it would compile successfully if the symbol rdma_get_peer_addr
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,
AFAIK)...
On May 3, 2008, at 10:56 AM, Pa
happened during
configure.
On May 2, 2008, at 7:09 PM, Pak Lui wrote:
Hi Jeff,
It seems that the cpc3 merge causes my Ranger build to break. I
believe it is using OFED 1.2 but I don't know how to check. It
passes the ompi_check_openib.m4 that you added in for the
rdma_get_p
_
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
ed in this regard for several years.
On 4/8/08 11:36 AM, "Pak Lui" wrote:
First, can your user executable create a signal handler to catch the
SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
you catch the signal and have the process to do nothing.
from signal(
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
s
Best Regards,
Lenny.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/dev
iling list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
7;s coming down the pipe that may
allow me to specify cpuids in a sequence, or we already have some
feature like that that I didn't know about? I look around but I don't
see anything like this.
Thanks in advance for any comments.
--
- Pak Lui
pak@sun.com
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
* Comm_size */
charname[64];/* the name if it has one */
} mqs_communicator;
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
--
- Pak Lui
pak@sun.com
tinfo.cgi/devel
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
sad...@gmx.net wrote:
Pak Lui schrieb:
sad...@gmx.net wrote:
Sorry for late reply, but I havent had access to the machine at the weekend.
I don't really know what this means. People have explained "loose"
vs. "tight" integration to me before, but since I
rt time and also also solve the
privileged socket limitation for launching parallel jobs. It will be in
the upcoming release.
--
- Pak Lui
pak@sun.com
ide
sge_execd not setting the limits correctly.
thanks for your great help :)
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
The workaround was to put the setting
into the ~/.tcshrc. So if SGE is not setting other resource limit
correctly or doesn't provide the option, you may have to workaround into
the ~/.tcshrc or simliar settings file for your shell. Otherwise it'll
probably fall back to use the system default.
--
- Pak Lui
pak@sun.com
-mpi.org/mailman/listinfo.cgi/devel
--
- Pak Lui
pak@sun.com
r the gigE interface. See if that works instead.
-mca btl_tcp_if_exclude lo,all_my_non_gigE_if
Orion Poplawski wrote:
Pak Lui wrote:
Hi Orion and Reuti,
Let me see if I can understand the issue by breaking them down first:
(1) First, I am curious to know why you would need to create a
PE_HOSTFIL
", &tok);
arch = strtok_r(NULL, " \n", &tok);
...
node->node_name = strdup(ptr);
node->node_arch = strdup(arch);
Perhaps it can be modified it uses the queue name hostname when doing
SGE/qrsh calls, but the first hostname when doing MPI communication.
Not really sure what the intent of the two fields in SGE's pe_hostfile
is, or if OpenMPI can handle the idea of two hostnames for different
purposes.
--
Thanks,
- Pak Lui
pak@sun.com
n/listinfo.cgi/devel
--
Thanks,
- Pak Lui
pak@sun.com
ave to look at it, but this may
not really be feasible.
Ralph
Jeff Squyres (jsquyres) wrote:
The main reason that it doesn't work is because we didn't do any thing
to make it work. :-)
Specifically, mpirun is not intercepting SIGSTOP and passing it on to
the remote nodes
t send the signal to orted on the
remote node, but only to 'mpirun'. I am trying to see how to work around
this.
--
Thanks,
- Pak Lui
pak@sun.com
just how big
an issue this is, relative release dates, other commitments, etc. etc.
Ralph
Pak Lui wrote:
Ralph Castain wrote:
First, the fact that an orted already exists on a node is not
sufficient to allow us to use it again for another application. The
orted must be persistent or else we d
We may be able to resolve this in a fairly
straightforward manner - I think a lot of the necessary tools are
already in the system, we just need to "hook them up" appropriately for SGE.
yup, that's the goal.
Ralph
Pak Lui wrote:
Hi,
When I run a spawn program over
d
for this limitation due to SGE slots. We could try to track and set some
top limit for the number of times that qrsh can exec, before the spawn
program uses up all the available SGE slots and errors out.
Ralph
Pak Lui wrote:
Hi,
When I run a spawn program over rsh/ssh, I notice that e
ound I can see aren't pretty though. So I welcome
your questions, comments or suggestions on this.
--
Thanks,
- Pak Lui
pak@sun.com
33 matches
Mail list logo