They could be from OMPI -- are you using QLogic IB NICs?  That's the only thing 
named "PSM" in Open MPI.


On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote:

> A typical file is called
> /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430
> 
> I had assumed that these were from OMPI, but clearly I could be wrong.
> They vary in size, but are typically 42MiB, only 0.2% of our small
> diskless nodes' memory, but put a dozen in there and they start to be
> noticed.  lsof shows all the processes in a particular job have the same
> one open, the other files are associated chronologically with failed
> jobs.
> 
> HTH
> 
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrush...@qinetiq.com
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
> 
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Jeff Squyres
> Sent: 14 April 2011 14:33
> To: Open MPI Users
> Subject: Re: [OMPI users] shm unlinking
> 
> On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:
> 
>> For your information: we were supplied with a script when we bought 
>> the cluster, but the original script made the assumption that all 
>> processes and shm files belonging to a specific user ought to be 
>> deleted.  This is a problem if users submit jobs which only half fill 
>> a node and the second job starts on the same node as the first one.  
>> The first job to finish causes the continuing job to stop dead.  We 
>> therefore had to disable any cleanup to allow jobs to run.  Now we are
> 
>> finding a slow fill up with the shm files and I need to do something; 
>> at least now I have a way forward.
> 
> Note that Open MPI v1.4.x is likely using mmap files by default -- these
> should be under /tmp/ somewhere.  If they get left around, they can
> cause shared memory to be filled up, but they should also be unrelated
> in /dev/shm kinds of things.  If you're seeing /dev/shm fill up, that
> might be due to something else.
> 
> Also, I'm a little confused by your reference to psm_shm... are you
> talking about the QLogic PSM device?  If that does some tomfoolery with
> /dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything
> about what that device does internally).
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is 
> addressed. If you are not the intended recipient of this email,
> you must neither take any action based upon its contents, nor 
> copy or show it to anyone. Please contact the sender if you 
> believe you have received this email in error. QinetiQ may 
> monitor email traffic data and also the content of email for 
> the purposes of security. QinetiQ Limited (Registered in England
> & Wales: Company Number: 3796233) Registered office: Cody Technology 
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX  http://www.qinetiq.com.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to