Hi, To increase the max open file, we have set execd_params in qconf –mconf and also on the OS level: execd_params H_DESCRIPTORS=262144,H_LOCKS=262144,H_MAXPROC=262144
On our execution nodes we can see that SGE sets a soft limit of 65535 despite that we told it to set it to 262144. After qlogin: [root@p2node01 ~]# cat /proc/104694/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size unlimited unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 262144 262144 processes Max open files 65535 262144 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks 262144 262144 locks Max pending signals 15023 15023 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us When running PE smp job requesting for 2 slots, the soft limit is set to 65535*2= 131070. The core number seems to be the exponent of the soft limit. If we request for more than 4 slots, it will exceed the hard limit and reset the max open files to the default of 1024. Our work around for this is to set H_DESCRIPTORS=9362. This is because some of our exec nodes are 28 cores. 28 x 9362= 262144 for the limit. I was wondering if there is a better way of doing this? You might think hey, why do we need to have 200k+ open file. This is due to someone using a software that has an open file handler leak and does not fclose properly. Their workaround is a dirty hack where the job ssh onto the localhost and bypass the ulimit set by SGE. Many thanks, Luis This electronic message is intended for the use of the named recipient only, and may contain information that is confidential, privileged or protected from disclosure under applicable law. If you are not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, disclosure, dissemination, distribution, copying or use of the contents of this message including any of its attachments is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and destroy all copies of this message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
