Just to add to what Ondrej said - there are two different settings in the
initial cgroup integration implemented.
One allows to over-commit memory as long as there is no memory pressure in the
kernel. But the actual
behavior depends on the Linux kernel. For debugging what Grid Engine set you
No direct support for that in SGE.
When a job is released from hold (like when another starts) does not
mean it is executed. Hence you would not have not any guarantee that
both are running at the same point in time.
You could submit the successor before the other one and give it the job id
of
If you are referring to SGE_EXECD_PORT and SGE_QMASTER_PORT for example they
are
not really Univa specific. They are installation specific. If you installed UGE
with self-set ports
then they are required (set by settings.sh file). If you install it with taking
out the ports from
the services
Hi Mikhail
That is indeed strange and the support request is handled properly in the
support portal. Things I can imagine: You are using host resources which
are requesting cores implicitly when requested (having cores attached with
topology masks) or you are running into an rare strtok() issue
Hi Bill,
You changed the global configuration (qconf -mconf or qconf -mconf global).
This is most likely overridden by the host local configuration.
Try with changing it in the host local configuration (qconf -mconf ).
You are right it takes a few seconds that the changes are propagated but
it is
Hi Joe,
Univa Grid Engine 8.3 added such a functionality to its APIs (WebService API)
so that you can submit on behalf of another user. The intention is to
simplify building web portals. But this is restricted to users listed in the
new sudoers Grid Engine ACL.
We can chat privately about that
There is unfortunately no way in SGE to limit main memory.
h_rss / s_rss does not work with the rlimit call in Linux kernel version above
2.4.
Hence in Univa Grid Engine we introduced multiple ways for doing main memory
limitations. If you have cgroups support turned on then the cgroup takes
Hi,
Please notice the difference between set linear:1:0,0“ and
set linear:1“. The first one means - give me one core starting
at socket 0 core 0 (which means here obviously you are
requesting core 0 on socket 0). The second means that
you want one core on the host and the execution daemon
takes
Since queue requests are not a part of DRMAA1 you
should use DRMAA_NATIVE_SPECIFICATION, which allows
you to set (almost) any qsub command line parameter
available. You can also use job categories but than
you have to configure it in the qtasks file.
DRMAA version 2 specifies queueName in the job
Am 26.03.2013 um 17:10 schrieb Reuti:
Hi,
Am 26.03.2013 um 12:17 schrieb Arnau Bria:
I'm migrating a bash jsv script to perl and adding some
modifications, but I have some doubts:
1) jsv_correct vs jsv_accept. From man:
If the result_type is ACCEPTED the job will be accepted as it
The easiest way would be to give a job a name with qsub -N job1 (or use -terse
for getting the job id) and
then using -hold_jid for the second job. More details you will find in the qsub
man page. Of course you
can also use DRMAA, or more unusual an array job with task throttling (-tc 1).
You can set arbitrary signals to be sent when suspension
is triggered (like SIGKILL). See: man queue_conf
section suspend_method
Daniel
Am 24.08.2012 um 03:13 schrieb Joseph Farran:
Howdy.
Is there a flag one can set on a job so that it will be killed instead of
being suspended for
.
Am 24.08.2012 um 08:52 schrieb Daniel Gruber:
You can set arbitrary signals to be sent when suspension
is triggered (like SIGKILL). See: man queue_conf
section suspend_method
Daniel
Am 24.08.2012 um 03:13 schrieb Joseph Farran:
Howdy.
Is there a flag one can set on a job so
What you could do is creating a queue for each GPU you
have on a host and assign them a queue exclusive GPU complex.
The amount of GPU queues are limiting then the amount of
GPU jobs. Then the total amount of cpu cores must be limited
differently by a RQS on a per host basis.
Daniel
Am
Try qacct -b 120101 -pe without anything.
Daniel
Am 24.07.2012 um 13:52 schrieb Nick Holway:
Dear all,
I'm trying to get some aggregate stats for all our parallel
environments using qacct. I'm using qacct -b 120101 -pe \* and I
also tried it with the * in double quotes. This
with allocation rule fillup the scheduler tries to maximize the
amount of slots which can be collected on any host. The host
selection order depends usually *not* on the amount of free slots
(anyway this could be configured).
It looks like that you have either already some smaller jobs running
Am 25.05.2012 um 12:35 schrieb Richard Ems:
On 05/25/2012 12:27 PM, Daniel Gruber wrote:
Exactly, looks like your runtime estimation for your slot4 jobs
is smaller than for your slot12 jobs. Backfilling must be active
here. Did you submit both jobs in exactly the same way with
s_rt? Try
Am 19.05.2012 um 19:16 schrieb Farkas, Illes:
Hello,
Is there a command (or an argument/switch of qsub) that tells the queue
manager to write into a file the maximum amount of memory used by one of the
jobs during its entire life time? To the best of my knowledge, after a job
finishes,
While core binding itself should work with such an topology (I never tried it)
in 6.2u5, the reporting of the topology string will be wrong. As you might
noticed,
string based load values are just reported up to a length of 1024 bytes,
that means that with 1000 nodes not the full topology
Am 03.08.2011 um 10:28 schrieb William Hay:
On 2 August 2011 17:58, Rayson Ho rayray...@gmail.com wrote:
It's a bug introduced by another bug fix in SGE 6.2u5, and Oracle was
first who fixed the bug in Oracle Grid Engine. Then we added a
workaround in SGE 6.2u5p1 in Open Grid Scheduler, and
20 matches
Mail list logo