Re: [OMPI users] UC job running out of memory

2014-11-18 Thread Rushton Martin
I've seen several suggestions for "home-brew" systems, usually modifying the paging mechanism. However there is one commercial solution I have seen advertised: https://numascale.com/index.html I've never used them and have no idea if they are any good or as good as they claim, you'll have to

Re: [OMPI users] UC Permission denied, please try again.

2012-08-01 Thread Rushton Martin
That looks like a login issue to compute-02-02, -00 and -03. Can you ssh to them? From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Syed Ahsan Ali Sent: 01 August 2012 08:45 To: Open MPI Users Subject: [OMPI users] Permission denied, please try again. Dear

Re: [OMPI users] UC Installation of openmpi-1.4.4

2011-12-22 Thread Rushton Martin
SuSE supports an application mpi-selector which installs when you load the (broken) SuSE version of OpenMPI. According to your choice of mpi implementation it will correctly set PATH, MANPATH and LD_LIBRARY_PATH for you. The initial installation drops the usual pair (.sh, .csh) of scripts into

Re: [OMPI users] UC UC Clean install fails

2011-12-21 Thread Rushton Martin
by hand. On Dec 21, 2011, at 8:13 AM, Rushton Martin wrote: > I agree it looks like it, but YaST ought to be able to work out > dependencies. Anyhow, here is the listing: > > isis:/usr/lib64/mpi/gcc/openmpi # ls -R lib64 > lib64: > libmca_common_sm.so.1 libmpi_f77.

Re: [OMPI users] UC Clean install fails

2011-12-21 Thread Rushton Martin
/lib64 On Dec 21, 2011, at 6:26 AM, Rushton Martin wrote: > For run-time problems: > > 1.Check the FAQ first.DONE. nothing obvious. > 2.The version of Open MPI that you're using. openmpi-1.4.3 > 3.The config.log file Installe

Re: [OMPI users] UC Clean install fails

2011-12-21 Thread Rushton Martin
er 2011 12:51 To: Open MPI Users Cc: Open MPI Users Subject: Re: [OMPI users] Clean install fails Your OMPI install looks busted. Can you send all the info listed in the "support" section on the OMPI web site? Sent from my phone. No type good. On Dec 21, 2011, at 7:37 AM, &quo

[OMPI users] Clean install fails

2011-12-21 Thread Rushton Martin
I've just completed installing OpenSuSE 12.1 on a standalone node. Using the bundled GCC and OpenMPI the user code fails. I've reduced the problem to that below, but without the source I'm not sure what orte_init is after. Using mpirun -np 1 or -np 2 both fail in the same way. Any suggestions?

Re: [OMPI users] UC EXTERNAL: Re: How to set up state-less node /tmpfor OpenMPI usage

2011-11-04 Thread Rushton Martin
There appears to be some confusion about ramdisks and tmpfs. A ramdisk sets aside a fixed amount of memory for its exclusive use, so that a file being written to ramdisk goes first to the cache, then to ramdisk, and may exist in both for some time. tmpfs however opens up the cache to programs so

Re: [OMPI users] shm unlinking

2011-04-14 Thread Rushton Martin
-- are you using QLogic IB NICs? That's the only thing named "PSM" in Open MPI. On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote: > A typical file is called > /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430 > > I had assumed that these were from OMPI, but cl

Re: [OMPI users] shm unlinking

2011-04-14 Thread Rushton Martin
On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote: > For your information: we were supplied with a script when we bought > the cluster, but the original script made the assumption that all > processes and shm files belonging to a specific user ought to be > deleted. This is a prob

Re: [OMPI users] shm unlinking

2011-04-14 Thread Rushton Martin
Thanks for the response. In the last 5 minutes I have managed to get some output from the prologue.parallel scripts, it turns out that the Torque Administrator's Manual was wrong, and I was fool enough to believe it! Now that I've got a working model I can start to sort out the mess with psm_shm

[OMPI users] shm unlinking

2011-04-14 Thread Rushton Martin
I notice from the developer archives that there was some discussion about when psm_shm... files ought to be unlinked from /dev/shm: - Subject: Re: [OMPI devel] System V Shared Memory for Open MPI: Request forCommunity Input and Testing From: Barrett, Brian W

Re: [OMPI users] Over committing?

2011-04-14 Thread Rushton Martin
allow us to run jobs reliably to completion when the code involved such issues. On Apr 13, 2011, at 10:07 AM, Rushton Martin wrote: > The 16 cores refers to x3755-m2s. We have a mix of 3550s and 3755s in > the cluster. > > It could be memory, but I think not. The jobs are well within me

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
printing this email. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: 13 April 2011 16:53 To: Open MPI Users Subject: Re: [OMPI users] Over committing? Am 13.04.2011 um 17:09 schrieb Rushton Martin: > Version 1.3.2 > >

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
s static, indeed most of the memory is configured as a humungous one-dimensional array. Very old school but what the heck, it works well when it's not hung. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Rushton Martin Sent: Wednesday, Ap

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
2011, at 9:09 AM, Rushton Martin wrote: > Version 1.3.2 > > Consider a job that will run with 28 processes. The user submits it > with: > > $ qsub -l nodes=4:ppn=7 ... > > which reserves 7 cores on (in this case) each of x3550x014 x3550x015 > and > x3550

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
ain Sent: 13 April 2011 15:34 To: Open MPI Users Subject: Re: [OMPI users] Over committing? On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote: > The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). > Jobs are submitted by Torque/MOAB. When run with up to np=8 there is

[OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). Jobs are submitted by Torque/MOAB. When run with up to np=8 there is good performance. Attempting to run with more processors brings problems, specifically if any one node of a group of nodes has all 8 cores in use the job