ing fine all the time. Only after
my suggestion to add h_vmem on an exechost level to avoid oversubscription all
the jobs crashed then, due to no memory being available (as h_vmem = 0 was used
this way as an automatically set limit).
Essentially: the default value in a complex definition is ig
Hi,
Any consumables in place like memory or other resource requests? Any output of
`qalter -w v …` or "-w p"?
-- Reuti
> Am 11.06.2020 um 20:32 schrieb Chris Dagdigian :
>
> Hi folks,
>
> Got a bewildering situation I've never seen before with simple SMP/th
et?)
Also in SSH itself it is possible with the "match" option in "sshd_config" to
allow only certain users from certain nodes.
Nevertheless: maybe adding "-v" to the `ssh` command will output additional
info, also the messages of `sshd` might be in some log file.
-- Reu
Hi,
It might be, that the application is ignoring the set OMP_NUM_THREADS (or
assumes a max value if unset) and using all cores in a machine. How many cores
are installed?
-- Reuti
Am 07.05.2020 um 01:04 schrieb Jerome IBt:
> Dear all
>
> I'm facing a strange problem with
> Am 02.05.2020 um 00:15 schrieb Mun Johl :
>
> Hi Reuti,
>
> Thank you for your reply.
> Please see my comments below.
>
>> Hi,
>>
>> Am 01.05.2020 um 20:44 schrieb Mun Johl:
>>
>>> Hi,
>>>
>>> I am using SGE on RH
gin via SSH. Hence I'm not sure what you
are looking for to be set.
Maybe you want to define in SGE to always use SSH -X?
https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
-- Reuti
>
> Please advise.
>
> Thank you and regards,
>
> --
> Mun
> __
Hi,
is it alwys failing on one and the same node? Or are several nodes affected?
One guess could be that the file system is full.
-- Reuti
> Am 05.03.2020 um 18:46 schrieb Jerome :
>
> Dear all
>
> I'm facing a strange error in SGE. One job is declared as in error
> Am 31.01.2020 um 18:23 schrieb Jerome IBt :
>
> Le 31/01/2020 à 10:19, Reuti a écrit :
>> Hi Jérôme,
>>
>> Personally I would prefer to keep the output of `qquota` short and use it
>> only for users's limits. I.e. defining the slot limit on an execho
9
My experience is, that sometime RQS are screwed up especially if used in
combination with some load values (although $num_proc is of course fixed in
your case).
-- Reuti
> Am 31.01.2020 um 17:00 schrieb Jerome :
>
> Dear all
>
> I'm facing a new problem on my cluster wi
Hi,
I never used SGE OGS/GE 2011.11p1, and for other derivates it seems to work as
intended. Is there any output in the messages file of the executing host where
it mentions to try to kill the process due to an exhausted wallclock time?
-- Reuti
> Am 28.01.2020 um 03:50 schrieb Derrick
r than 48 hours.
Are your directing these commands to SSH?
-- Reuti
> I am wondering if this is a known issue?
>
> I am running open source version of SGE OGS/GE 2011.11p1
>
> Cheers,
> Derrick
> ___
> users mailing list
&g
stomping on the value of PATH.
Another option could be an "adjustment" of the PATH variable by a JSV.
-- Reuti
>
>> and epilog scripts run with the submission environment but possibly in the
>> context of a different user (i.e. a user could point a root-running prolog
> Am 22.01.2020 um 15:14 schrieb WALLIS Michael :
>
>
> From: Reuti
>
>>> (for the record, if the number of used_slots is higher than the number
>>> of slots, no jobs using that PE will run. Don't know how that's even
>>> possible.)
>
slots, no jobs using that PE will run. Don't know how that's even possible.)
You mean the setting of "slots" in the definition of a particular PE?
-- Reuti
> Cheers,
>
> Mike
>
> --
> Mike Wallis x503305
> University of Edinburgh, Research Services,
&g
There are patches around to attach the additional group id to the ssh daemon:
https://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html
rlogin is used for an interactive login by `qrsh`, rsh for `qrsh` with a
command.
-- Reuti
> Am 09.12.2019 um 18:39 schrieb Korzennik, Sylv
eed something different
> in the GE configuration to enable this?
Are you using the "builtin" method for the startup or SSH, i.e. for the
settings in rsh_daemon resp. rsh_client?
-- Reuti
> Cheers,
> Sylvain
> --
> _
he communication of the daemons/clients for `qrsh` …
in SGE has no support for X11 forwarding. Hence the approach by `qrsh xterm`
should give a suitable result when set up to use "/usr/bin/ssh -X -Y".
-- Reuti
> Your job 2657108 ("INTERACTIVE") has been submitted
> wa
e29
ramdisk@node19
common@node27
common@node23
common@node23
common@node28
-- Reuti
>
> Regards,
>
> --
> Mun
>
>
> From: Mun Johl
> Sent: Friday, October 25, 2019 5:42 PM
> To: dpo...@gmail.com
> Cc: Skylar Thomp
rnal name changes, while the
internal ones stay the same?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
scripts (something
> specific to GDI over the $SGE_EXECD_PORT, ssh, scp, something else)?
It uses its own protocol. No SSH inside the cluster is necessary.
> What are the system-level requirements for succesfully sending the
> submit scripts (for example: same UID for sge across the cluster, same
> UID<->username for the user submitting the job across the cluster, etc)?
Yes.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
7;m thinking about this again - does the subordinate queue
> setting accept 'queueu@@hostgroup' syntax like everything else? Don't
> remember if I ever tried that.
Yes, one can limit it to be available on certain machines only:
subordinate_list NONE,[@intel2667v4=short]
respond correctly to SIGSTOP, but the GPU portion keeps
> running).
>
> Is there any way, with our current number of queues, to exempt jobs
> using a GPU resource complex (-l gpu) from being suspended by short jobs?
Not that I'm aware of. Almost 10 years ago I
input file: the number of slots will be corrected in the copy of the input
file to the number of granted slots. Let me know if you would like to get them.
-- Reuti
> I was told the this should be possible in slurm (which we don't have,
minating the "jclass" column, which doesn't
> contain any information, but I can only find ways to add columns, not take
> them away. Is there a way to make this column go away?
Besides `cut -b`: what type of output are you looking for? There is a `qstatus`
AWK
Hi Ilya,
Am 31.07.2019 um 00:55 schrieb Ilya M:
> Hi Reuti,
>
> So /home is not mounted via NFS as it's usually done?
> Correct.
>
>
> How exactly is your setup? I mean: you want to create some kind of pseudo
> home directory on the nodes (hence "-b y
making this work or at least getting
> meaningful debug output.
How exactly is your setup? I mean: you want to create some kind of pseudo home
directory on the nodes (hence "-b y" can't be used with user binaries) and the
staged job script
, I would attach it the exechosts although
attaching them to a queue might be shorter and a central location for all
definitions.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
/qmaster/./sharetree" for
> reading: No such file or directory
Did you transfer the old configuration or does this pop up in a fresh installed
system?
Unfortunately the procedure might be changed by the ROCKS distributio
H daemon got.
The changes of the "nofile" setting should be visible in the shell when you log
in too.
-- Reuti
> Currently, my only workaround is to rebuild the Compute Node (reinstall OS
> etc) so that it corrects this issue.
>
> >> Can you check the limits that are
settings but the rest are fine.
Several ulimits can be set in the queue configuration, and can so different for
each queue or exechost.
-- Reuti
> I am wondering if this is SGE related? And idea is welcomed.
>
> Cheers,
> Derrick
> __
SGE will then notice
that your job and the sum of all of its processes passed this limit and kill
the job. SGE can do this, by using the additional group ID which is attached to
all processes by a particular job, while the kernel might only watch a certain
process.
Using now an assigned c
t wait until all running jobs without
this constraint were drained.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
n "Finished Jobs".
But it won't retrieve jobs which were already drained from the listing. Also
after a restart of the `qmaster`, the list will initially be empty.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
AFAICS the sent kill by SGE happens after a task returned already with an
error. SGE would in this case use the kill signal to be sure to kill all child
processes. Hence the question would be: what was the initial command in the
job script, and what output/error did it generate?
-- Reuti
> Am 25.04.2019 um 17:41 schrieb Mun Johl :
>
> Hi Skyler, Reuti,
>
> Thank you for your reply.
> Please see my comments below.
>
> On Thu, Apr 25, 2019 at 08:03 AM PDT, Reuti wrote:
>> Hi,
>>
>>> Am 25.04.2019 um 16:53 schrieb Mun Johl :
>
;
> I've searched the man pages and web for definitions of the output of
> qacct, but I have not been able to find a complete reference (just bits
> and pieces here and there).
>
> Can anyone point me to a complete reference so that I can better
> understand the output of qacct?
T
Am 09.04.2019 um 21:08 schrieb Mun Johl:
> Hi Reuti,
>
> One clarification question below ...
>
> On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
>>> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>>>
>>> Hi Reuti,
>>>
>>>
Am 09.04.2019 um 21:08 schrieb Mun Johl:
> Hi Reuti,
>
> One clarification question below ...
>
> On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
>>> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>>>
>>> Hi Reuti,
>>>
>>>
> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>
> Hi Reuti,
>
> Thank you for your reply!
> Please see my comments below.
>
> On Mon, Apr 08, 2019 at 10:27 PM PDT, Reuti wrote:
>> Hi,
>>
>>> Am 09.04.2019 um 05:37 schrieb Mun Johl :
>>>
way, would the contractor only need an account on serverA in
> order to utilize SGE? Or would he need an account on the grid master as
> well?
Are you not using a central user administration by NIS or LDAP?
AFAICS he needs an entry only on the execution host (and on the submission host
of co
sq all.q
…
user_lists NONE,[@special_hosts=allowed_users]
and the white listed users have to be in the allowed_users ACL and the
@special_hosts are the machines where ordinary users are banned from.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
> Am 12.03.2019 um 15:55 schrieb David Trimboli :
>
>
> On 3/5/2019 12:34 PM, David Trimboli wrote:
>>
>> On 3/5/2019 12:18 PM, Reuti wrote:
>>>> Am 05.03.2019 um 18:06 schrieb David Trimboli
>>>> :
>>>>
>>>> I
lways full nodes,
you won't have this problem on a local scratch directory for $TMPDIR though.
===
BTW: did I mention it: no need to be root anywhere.
-- Reuti
multi-spawn.sh
Description: Binary data
__SGE_PLANCHET__.tgz
Description: Binary data
cluster.tgz
Description: Binary
doesn't seem to change the order in
> which they run.
You mean the value you set with "-p"?
To which value did you change this for certain jobs?
$ qstat -pri
might give a hint about the overall values which are assigned to a job in
column "npprior".
-- Reuti
>
&g
principle {all.q} wouldn't hurt as it means "for each entry in the list",
and the only entry is all.q. But to lower the impact I would leave this out.
-- Reuti
> }
>
> I get the feeling that will limit the number of slots that all users can
> collectively use simul
Hi,
> Am 27.02.2019 um 22:07 schrieb Kandalaft, Iyad (AAFC/AAC)
> :
>
> HI Reuti
>
> I'm implementing only a share-tree.
Then you can set:
policy_hierarchy S
The past usage is stored in the user object, hence auto_user_delete_time
should be ze
Hi,
there is a man page "man sge_priority". Which policy do you intend to use:
share-tree (honors past usage) or functional (current use), or both?
-- Reuti
> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC)
> :
>
> Hi all,
>
> I recently implement
ing info via qacct. I am wondering
> what is the common way to achieve this without giving access to the qmaster
> node?
You mean, $SGE_ROOT is not shared in your cluster?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
f <(zcat accounting.0.gz)
-- Reuti
> --
> JY
> --
> "All ideas and opinions expressed in this communication are
> those of the author alone and do not necessarily reflect the
> ideas and opinions of anyone else."
>
how to read it. Any helpful insight
> much appreciated
Did you try to stop and start the qmaster?
-- Reuti
> qping -i 5 -info hpc-s 6444 qmaster 1
> 01/26/2019 01:12:18:
> SIRM version: 0.1
> SIRM message id: 1
> start time: 01/26/2
hy one has no access (as the permission bits look fine). Essentially
the exporting NFS machine will deny the access according to certain bits of the
permission bits. Does the line form above contain a plus sign on the exporting
machine like:
-rwxr-xr-x+ 1 root root 1941408 Feb 28 2016 /opt/sge/
> Am 24.01.2019 um 20:29 schrieb David Triimboli :
>
> On 1/24/2019 2:05 PM, Reuti wrote:
>> Do the permissions for the directories include the x flag and not only r?
>>
>> drwxr-xr-x 2 root root 4.0K Jan 13 2010 man1
>> drwxr-xr-x 2 root root 4.0K Jan 13 20
> Am 24.01.2019 um 19:28 schrieb David Triimboli :
>
> On 1/24/2019 1:14 PM, Reuti wrote:
>>> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>>>
>>> On 1/24/2019 12:44 PM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 24.0
> Am 24.01.2019 um 19:28 schrieb David Triimboli :
>
> On 1/24/2019 1:14 PM, Reuti wrote:
>>> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>>>
>>> On 1/24/2019 12:44 PM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 24.0
> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>
> On 1/24/2019 12:44 PM, Reuti wrote:
>> Hi,
>>
>>> Am 24.01.2019 um 18:28 schrieb David Triimboli :
>>>
>>> This is just a silly question. Using Son of Grid Engine 8.1.9, I installed
>>
n pages? How do I get it to do
> so?
What is the output of the command:
manpath
on both machines?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
shared memory listed by `ipcs`
and it starts to swap?
-- Reuti
>
> Regards,
>
> Derek
> -Original Message-
> From: Reuti
> Sent: January 18, 2019 11:26 AM
> To: Derek Stephenson
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Dilemma w
> Am 18.01.2019 um 18:06 schrieb David Triimboli :
>
> On 1/18/2019 11:49 AM, Reuti wrote:
>>> Am 18.01.2019 um 17:41 schrieb David Triimboli :
>>>
>>> On 1/18/2019 11:22 AM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 18.
> Am 18.01.2019 um 17:41 schrieb David Triimboli :
>
> On 1/18/2019 11:22 AM, Reuti wrote:
>> Hi,
>>
>>> Am 18.01.2019 um 17:09 schrieb David Triimboli :
>>>
>>> Hi, all. I've got a twenty-four-node cluster running versions of CentOS 5
&g
> Am 18.01.2019 um 16:26 schrieb Derek Stephenson
> :
>
> Hi Reuti,
>
> I don't believe anyone has adjusted the scheduler from defaults but I see:
> schedule_interval 00:00:04
> flush_submit_sec 1
> flush_finish_sec
ld assume that most likely the `arch` script inside SGE isn't prepared for
your actual kernel, i.e. a case for 4.* kernels is missing. What does:
$ $SGE_ROOT/util/arch
return?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
on socket fd 7
For interactive jobs: any firewall in place, blocking the communication between
the submission host and the exechost – maybe switched on at a later point in
time? SGE will use a random port for the communication. After the reboot it
worked instantly again?
-- Reuti
> Now I'v
> Am 11.01.2019 um 00:30 schrieb Derrick Lin :
>
> Hi Reuti
>
> Thanks for the input. But how does this help on troubleshooting the prolog
> script?
You asked for the meaning of the "-i" option, and I tried to outline its
behavior.
-- Reuti
> I will also tr
> Am 09.01.2019 um 23:39 schrieb Derrick Lin :
>
> Hi Reuti and Iyad,
>
> Here is my prolog script, it just does one thing, setting quota on the XFS
> volume for each job:
>
> The prolog_exec_xx_xx.log file was generated, so I assumed the first exec
> command go
Hi,
Am 09.01.2019 um 23:35 schrieb Derrick Lin:
> Hi Reuti,
>
> I have to say I am still not familiar with the "-i" in qsub after reading the
> man page, what does it do?
It will be feed as stdin to the jobscript. Hence:
$ qsub -i myfile foo.sh
is like:
$ foo.sh
ed.
Is there any statement in the prolog, which could wait for stdin – and in a
batch job there is just no stdin, hence it continues? Could be tested with "-i"
to a batch job.
-- Reuti
> qsub job are working fine.
>
> Any idea will be appreciated
>
> Cheers,
> Derrick
eractive.q; maybe you cen remove the PE smp there, unless you want to use it
interactively too.
-- Reuti
> $ qconf -sp smp
> pe_namesmp
> slots 999
> user_lists NONE
> xuser_listsNONE
> start_proc_argsNONE
> stop_proc_
ents
depends on the method you started the session: rsh, ssh or built-in. IIRC the
last `if [ -n "$MYJOBID" ];` section had only the purpose to display a message,
which was set with "-ac" during submission and might not be necessary here.
-- Reuti
MYPARENT=`ps -p $$ -o ppid --
Am 07.12.2018 um 21:33 schrieb Derrick Lin:
> Reuti,
>
> My further tests confirm that $TMP is set inside PROLOG, $TMPDIR is not.
>
> Both $TMPDIR and $TMP are set in job's environment.
>
> So technically my problem is solved by switching to $TMP.
>
> But I
receive TMPDIR which should be created by
> the scheduler.
Is $TMP set?
-- Reuti
>
> Other variables such as JOB_ID, PE_HOSTFILE are available though.
>
> We have been using the same script on the CentOS6 cluster with OGS/GE
> 2011.11p1 without an issue
I found my entry about this:
https://arc.liv.ac.uk/trac/SGE/ticket/570
-- Reuti
> Am 06.12.2018 um 19:03 schrieb Reuti :
>
> Hi,
>
>> Am 06.12.2018 um 18:36 schrieb Dan Whitehouse :
>>
>> Hi,
>> I've been running some MPI jobs and I expected that w
olog). The job were supposed to run in a dedicated cleaner.q
only with no limits regarding slots (hence they started as soon as they were
eligible tun start), but got a job hold on the actual job which submitted them
to wait until it finished.
-- Reuti
>
> --
> Dan Whitehouse
>
slots limited:
$ qconf -se global
especially the line "complex_values".
And next: any RQS?
$ qconf -srqs
-- Reuti
> El jue., 6 dic. 2018 a las 12:55, Reuti ()
> escribió:
>
> > Am 06.12.2018 um 15:19 schrieb Dimar Jaime González Soto
> > :
> >
> >
his looks fine. So we have other settings to investigate:
$ qconf -sconf
#global:
execd_spool_dir /var/spool/sge
...
max_aj_tasks 75000
Is max_aj_tasks limited in your setup?
-- Reuti
>
> El jue., 6 dic. 2018 a las 11:13, Reuti ()
> escribió:
:04:02
>1 17-60:1
Aha, so they are running already on remote nodes – fine. As the setting in the
queue configuration is per host, this should work and provide more processes
per node instead of four.
Is there a setting for the exechosts:
qconf -se ubuntu-n
for a running job. There should be a line for each
executing task while the waiting once are abbreviated in one line.
-- Reuti
>
> I would try running qalter -w p against the job id to see what it says.
>
> William
>
>
>
>>
>>> Am 05.12.2018 um 19
ans for the
communication? The PE would deliver a hostlist to the application, which then
can be used to start processes on other nodes too. Some MPI libraries even
discover
BTW: as you wrote "16 logical threads": often its advisable for HPC to disable
Hyp
r with 744 permissions.
Were the sge_execd on these new machines started by root or sgeadmin?
$ ps -e f -o user,ruser,command | grep sge
sgeadmin root /usr/sge/bin/lx24-em64t/sge_execd
root root \_ /bin/sh /usr/sge/cluster/tmpspace.sh
-- Reuti
>
>
> The user directo
ob_is_first_task FALSE
> urgency_slots min
> accounting_summary FALSE
>
> Then we have run our application "Maker" like this,
> qsub -cwd -N -b y -V -pe mpi /opt/mpich-install/bin/mpiexec
> maker
Which version of MPICH are you using? Maybe it's not tightly integr
Hi,
> Am 23.10.2018 um 20:31 schrieb Dj Merrill :
>
> Hi Reuti,
> Thank you for your response. I didn't describe our environment very
> well, and I apologize. We only have one queue. We've had a few
> instances of people forgetting they ran a job that d
quot; must be used).
There is an introduction to use the checkpoint interface here:
https://arc.liv.ac.uk/SGE/howto/checkpointing.html
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
lar job.
Please let me know, if something is unclear. I hope it will work in a general
case too.
-- Reuti
mail-wrapper script follows
#!/bin/sh
#
# Assemble an email and attach an output file.
# Note: SGE will call this routine for each and every recipient spec
sent email, so that the users can check the result of the computation even
without login into the cluster.
e) this mail-wrapper script needs again to be defined with: qconf -mconf
mailer /usr/sge/cluster/mailer.sh
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
.
-- Reuti
Von meinem iPhone gesendet
> Am 15.09.2018 um 19:15 schrieb Simon Matthews :
>
> Is there any way to bring up an execd node, without explicitly
> configuring it at the qmaster? Perhaps it could come up and be added
> to a default queue?
>
> If it is possible to do th
ombine processes (like for MPI) and
threads (like for Open MP).
In your case, it looks to me that you assume that necessary cores are
available, independent from the actual usage of each node?
-- Reuti
PS: I assume with CPUS you refer to CORES.
> I have written up an article at:
>
r a setup file?
>
> Running on Ubuntu. Installed by `sudo apt-get install gridengine-master
> gridengine-client` and accepted all defaults.
I have no clue about the Ubuntu issue. But usually you have to run a setup
beforehand twice - one for the master, one for the client. Do
our case), the
jobscript is first transferred by SGE's protocol to the node, where the execd
writes the jobscript in the shared space, which is on the headnode again.
If you peek into the given file, you will hence find the original jobscript of
the user. Does the jobscript try to modify i
obs and they ran and completed without an problem.
Could it be a race condition with the shared file system?
-- Reuti
> I am wondering what may has caused this situation in general?
>
> Cheers,
> Derrick
> ___
> users mailing list
>
Hi,
You can try to have a look at the extended output of `qstat`:
$ qstat -ext
$ qstat -pri
In addition, the way the priority is honored and essentially computed is
outlined here:
$ man sge_priority
Maybe this will shed some light on it and point to the cause of it.
-- Reuti
PS: You may
> Am 01.08.2018 um 03:06 schrieb Derrick Lin :
>
> HI Reuti,
>
> The prolog script is set to run by root indeed. The xfs quota requires root
> privilege.
>
> I also tried the 2nd approach but it seems that the addgrpid file has not
> been created when the prolog
> Am 30.07.2018 um 02:31 schrieb Derrick Lin :
>
> Hi Reuti,
>
> The approach sounds great.
>
> But the prolog script seems to be run by root, so this is what I got:
>
> XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)
This is quite unusual. Do yo
> Am 28.07.2018 um 03:00 schrieb Derrick Lin :
>
> Thanks Reuti,
>
> I know little about group ID created by SGE, and also pretty much confused
> with the Linux group ID.
Yes, SGE assigns a conventional group ID to each job to track the CPU and
memory consumption. This
e.
This can be found in the `id` command's output or in location of the spool
directory for the execd_spool_dir in
${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
-- Reuti
> That's why I am trying to implement the xfs_projid to be independent from SGE.
>
>
>
>
did for generating project ID:
>
> XFS_PROJID_CF="/tmp/xfs_projid_counter"
>
> echo $JOB_ID >> $XFS_PROJID_CF
> xfs_projid=$(wc -l < $XFS_PROJID_CF)
The xfs_projid is then the number of lines in the file? Why not using $JOB_ID
directly? Is there a limit in max. project ID an
,56,98,134 failed to finish.
>
> What can I do to only run the failed job now? Can I use -t option in anyway
> or do I have to submit it one by one?
You have to submit it one by one, possibly in a `for` loop, but you can use -t
to specify the to be used index at least as a single number.
> Am 11.06.2018 um 18:43 schrieb Ilya M <4ilya.m+g...@gmail.com>:
>
> Hello,
>
> Thank you for the suggestion, Reuti. Not sure if my users' pipelines can deal
> with multiple job ids, perhaps they will be willing to modify their code.
Also other commands in SGE
mber to get all the runs listed
though.
-- Reuti
> This is my test script:
>
> #!/bin/bash
>
> #$ -S /bin/bash
> #$ -l s_rt=0:0:5,h_rt=0:0:10
> #$ -j y
>
> set -x
> set -e
> set -o pipefail
> set -u
>
> trap "exit 99" SIGUSR1
>
uot; or "xprojects", any project may
run there. Have a look at `man queue_conf` section "projects".
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
containing all GPU nodes:
qconf -mattr queue calendar wartung common@@myGPUgroup
-- Reuti
> Ilya.
>
>
> On Wed, Jun 6, 2018 at 2:41 AM, Mark Dixon wrote:
> On Tue, 5 Jun 2018, Ilya M wrote:
> ...
> Is there a way to submit AR when there are projects attached to queues? I a
ript itself I set to:
UNCONFIGURED=no
ACTION_ON=2
ACTIONSIZE=1024
KEEPOLD=3
Note: to read old accouting files in `qacct` on-the-fly you can use:
$ qacct -o reuti -f <(zcat /usr/sge/default/common/accounting.1.gz)
-- Reuti
>
> Thanks for
1 - 100 of 2402 matches
Mail list logo