Re: [gridengine users] m_mem_free and cgroups

2020-08-12 Thread Daniel Gruber
Just to add to what Ondrej said - there are two different settings in the initial cgroup integration implemented. One allows to over-commit memory as long as there is no memory pressure in the kernel. But the actual behavior depends on the Linux kernel. For debugging what Grid Engine set you

Re: [gridengine users] bsub -w "started(aJob)"

2016-06-14 Thread Daniel Gruber
No direct support for that in SGE. When a job is released from hold (like when another starts) does not mean it is executed. Hence you would not have not any guarantee that both are running at the same point in time. You could submit the successor before the other one and give it the job id of

Re: [gridengine users] Looking for a solution to integrate SGE or similar with Jenkins/buildbot/similar and Vagrant/Docker/similar

2016-04-25 Thread Daniel Gruber
If you are referring to SGE_EXECD_PORT and SGE_QMASTER_PORT for example they are not really Univa specific. They are installation specific. If you installed UGE with self-set ports then they are required (set by settings.sh file). If you install it with taking out the ports from the services

Re: [gridengine users] Core binding strange behaviour UGE 8.1

2016-02-29 Thread Daniel Gruber
Hi Mikhail That is indeed strange and the support request is handled properly in the support portal. Things I can imagine: You are using host resources which are requesting cores implicitly when requested (having cores attached with topology masks) or you are running into an rare strtok() issue

Re: [gridengine users] Trying to code C program to process SGE job-status email

2015-10-20 Thread Daniel Gruber
Hi Bill, You changed the global configuration (qconf -mconf or qconf -mconf global). This is most likely overridden by the host local configuration. Try with changing it in the host local configuration (qconf -mconf ). You are right it takes a few seconds that the changes are propagated but it is

Re: [gridengine users] Qsub flag for changing user

2015-09-14 Thread Daniel Gruber
Hi Joe, Univa Grid Engine 8.3 added such a functionality to its APIs (WebService API) so that you can submit on behalf of another user. The intention is to simplify building web portals. But this is restricted to users listed in the new sudoers Grid Engine ACL. We can chat privately about that

Re: [gridengine users] Enforce users to use specific amount of memory/slot

2014-06-30 Thread Daniel Gruber
There is unfortunately no way in SGE to limit main memory. h_rss / s_rss does not work with the rlimit call in Linux kernel version above 2.4. Hence in Univa Grid Engine we introduced multiple ways for doing main memory limitations. If you have cgroups support turned on then the cgroup takes

Re: [gridengine users] Qlogin and Core binding

2014-06-27 Thread Daniel Gruber
Hi, Please notice the difference between set linear:1:0,0“ and set linear:1“. The first one means - give me one core starting at socket 0 core 0 (which means here obviously you are requesting core 0 on socket 0). The second means that you want one core on the host and the execution daemon takes

Re: [gridengine users] How specify the queue name with the DRMAA api

2013-05-29 Thread Daniel Gruber
Since queue requests are not a part of DRMAA1 you should use DRMAA_NATIVE_SPECIFICATION, which allows you to set (almost) any qsub command line parameter available. You can also use job categories but than you have to configure it in the qtasks file. DRMAA version 2 specifies queueName in the job

Re: [gridengine users] jsv and MPI core bind questions

2013-03-26 Thread Daniel Gruber
Am 26.03.2013 um 17:10 schrieb Reuti: Hi, Am 26.03.2013 um 12:17 schrieb Arnau Bria: I'm migrating a bash jsv script to perl and adding some modifications, but I have some doubts: 1) jsv_correct vs jsv_accept. From man: If the result_type is ACCEPTED the job will be accepted as it

Re: [gridengine users] in tandem qsub running

2012-09-26 Thread Daniel Gruber
The easiest way would be to give a job a name with qsub -N job1 (or use -terse for getting the job id) and then using -hold_jid for the second job. More details you will find in the qsub man page. Of course you can also use DRMAA, or more unusual an array job with task throttling (-tc 1).

Re: [gridengine users] Do not suspend job, kill instead

2012-08-24 Thread Daniel Gruber
You can set arbitrary signals to be sent when suspension is triggered (like SIGKILL). See: man queue_conf section suspend_method Daniel Am 24.08.2012 um 03:13 schrieb Joseph Farran: Howdy. Is there a flag one can set on a job so that it will be killed instead of being suspended for

Re: [gridengine users] Do not suspend job, kill instead

2012-08-24 Thread Daniel Gruber
. Am 24.08.2012 um 08:52 schrieb Daniel Gruber: You can set arbitrary signals to be sent when suspension is triggered (like SIGKILL). See: man queue_conf section suspend_method Daniel Am 24.08.2012 um 03:13 schrieb Joseph Farran: Howdy. Is there a flag one can set on a job so

Re: [gridengine users] GPU node with pe and complex

2012-08-23 Thread Daniel Gruber
What you could do is creating a queue for each GPU you have on a host and assign them a queue exclusive GPU complex. The amount of GPU queues are limiting then the amount of GPU jobs. Then the total amount of cpu cores must be limited differently by a RQS on a per host basis. Daniel Am

Re: [gridengine users] qacct wildcards for parallel environments

2012-07-24 Thread Daniel Gruber
Try qacct -b 120101 -pe without anything. Daniel Am 24.07.2012 um 13:52 schrieb Nick Holway: Dear all, I'm trying to get some aggregate stats for all our parallel environments using qacct. I'm using qacct -b 120101 -pe \* and I also tried it with the * in double quotes. This

Re: [gridengine users] Understanding Parallel Enviroment ( whole nodes )

2012-06-08 Thread Daniel Gruber
with allocation rule fillup the scheduler tries to maximize the amount of slots which can be collected on any host. The host selection order depends usually *not* on the amount of free slots (anyway this could be configured). It looks like that you have either already some smaller jobs running

Re: [gridengine users] Reservations and parallel environments

2012-05-25 Thread Daniel Gruber
Am 25.05.2012 um 12:35 schrieb Richard Ems: On 05/25/2012 12:27 PM, Daniel Gruber wrote: Exactly, looks like your runtime estimation for your slot4 jobs is smaller than for your slot12 jobs. Backfilling must be active here. Did you submit both jobs in exactly the same way with s_rt? Try

Re: [gridengine users] final maxvmem of a job

2012-05-19 Thread Daniel Gruber
Am 19.05.2012 um 19:16 schrieb Farkas, Illes: Hello, Is there a command (or an argument/switch of qsub) that tells the queue manager to write into a file the maximum amount of memory used by one of the jobs during its entire life time? To the best of my knowledge, after a job finishes,

Re: [gridengine users] Automatic CPU core binding - JSV script

2012-01-12 Thread Daniel Gruber
While core binding itself should work with such an topology (I never tried it) in 6.2u5, the reporting of the topology string will be wrong. As you might noticed, string based load values are just reported up to a length of 1024 bytes, that means that with 1000 nodes not the full topology

Re: [gridengine users] Maximum memory for running process?

2011-08-06 Thread Daniel Gruber
Am 03.08.2011 um 10:28 schrieb William Hay: On 2 August 2011 17:58, Rayson Ho rayray...@gmail.com wrote: It's a bug introduced by another bug fix in SGE 6.2u5, and Oracle was first who fixed the bug in Oracle Grid Engine. Then we added a workaround in SGE 6.2u5p1 in Open Grid Scheduler, and