27;t present in former times. I have no clue
when along the way the regression was introduced though.
-- Reuti
signature.asc
Description: Message signed with OpenPGP using GPGMail
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ee it in the
> cluster config.
There is none. The default for a new queue is /bin/csh, but how often do you
create new cluster queues? Most of the time I set "shell_startuup_mode
unix_behavior" anyway by hand when creating a new queue.
-- Reuti
> -Original Message-
&g
ct I can submit to dsbm04,
> which precedes dsbm05.
>
> I recently upgraded from sge6.1 to sge6.2u6, though I can’t be sure that’s
> the only thing that’s changed. How do I even begin to debug this?
Did you upgrade all nodes?
-- Reuti
signature.asc
Description: Message signed w
nt@ibm044: /home/johnt/temp #
> --
What is sourced and set in a batch job depends on the shell_start_mode in the
queue configuration.
It might also be the case, that something is set by /etc/profile or
/etc/profile.local which is not
chost sends them).
It's necessary to tell the MPI applications which port range to use and
configure them accordingly (and the firewall). IIRC there is also no way to
force SGE's `qrsh` to use only certain ports.
Maybe if your cluster has essentially such a configuration, you can convi
> Am 10.04.2017 um 11:34 schrieb sudha.penme...@wipro.com:
>
> Hi Reuti,
>
> Yes, We need the core dump.
>
> The configuration for the queue is
>
> h_coreINFINITY
Maybe there is not enough disk space available? Are they writing the core in
the ho
Do you need the core dump?
$ qconf -sq all.q
…
h_core0
-- Reuti
> Am 10.04.2017 um 10:27 schrieb sudha.penme...@wipro.com:
>
> Hi,
>
> Some of the Jobs halt in grid when the aborted jobs does not end as the
> generation of the core dump gets stuck which
Am 09.04.2017 um 15:47 schrieb Yong Wu:
> Reuti,
> Thanks for your reply again!
>
> >I can assure you, that for me and others it's working.
> But it's not working for me.
>
> >Aha, I only set the $OMP_ROOT/etc/openmpi-mca-params.conf to have an entry
&g
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
Am 09.04.2017 um 11:14 schrieb Yong Wu:
> Dear Reuti,
> Thank you very much!
> The jobname.nodes file is not necessary for parallel ORCA. And my
> "mpivars.sh" is also not a problem.
> ORCA3.0.3 program is compiled with
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
BTW: The Open MPI bug you checked already:
https://www.mail-archive.com/users@lists.open-mpi.org/msg30824.html
- -- Reuti
Am 08.04.2017 um 20:42 schrieb Reuti:
> Hi,
>
> Am 07.04.2017 um 16:04 schrieb Yong Wu:
>
>> Tha
gt; &>test.log &"
Please don't use "&" in the job script to put the job in the background. The
job script might end and SGE discovers this an kills all orphaned processes.
Also with Torque this shouldn't be necessary.
- -- Reuti
-BEGIN PGP SIGNATURE-
ey don't just pull the information out of a
`mpiexec`).
> cp ${SGE_O_WORKDIR}/test.inp $tdir
>
> cd $tdir
Side note:
In ORCA there seem several types of jobs to exist:
- some types of ORCA jobs can compute happily in $TMPDIR using the scratch
directory on the nodes (even i
Hi,
> Am 05.04.2017 um 09:23 schrieb Lionel SPINELLI :
>
> Hello Reuti,
>
> thanks for your response. The problem occured when I had to move the shared
> folder where output files are wirtten from a NAS to an other. On the old NAS,
> everything was fine. I changed for a
ded in the source. Some users allow others to write to
the output files too, or have a distribution where each user is his own group
too.
> I have tried to set this started method in my queue with the script:
Is this effect new in the cluster, or is it a new installation and fails
instant
ghly
specialized solution. An installation of the plain files
starter.sh/terminate.sh won't give a working solution, but I can send them to
you in case you are interested in PM.
-- Reuti
signature.asc
Description: Message signed with OpenPGP using GPGMail
_
ting file, it's updated only once
> a job is finished and hence, we can't capture chronologically.
Correct, but the submission time is recorded in the accounting file too.
- -- Reuti
> And SGE upgradation is not an option at least for now, as we are looking for
> a quick re
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
BTW: you are still using GS/GE 2011.11?
This might work better: https://arc.liv.ac.uk/trac/SGE
- -- Reuti
Am 28.03.2017 um 19:10 schrieb jeevan.patn...@wipro.com:
> Hi,
>
>
> I am searching for a way to get the values of requested r
7;t make it to the archive. If someone is interested, I can
post them here again.
- -- Reuti
> I found this option MONITOR=1 to be set to params in sge conf and with this,
> the required details are being logged in a separate file called schedule as
> follows:
>
> 3544466:100:STAR
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Am 24.03.2017 um 22:11 schrieb Joshua Baker-LePain:
> On Fri, 24 Mar 2017 at 1:03pm, Reuti wrote
>
>>> Is this expected behavior? Or is something wonky with the cgroups here?
>>> Thanks for any insights.
>
> A
behavior? Or is something wonky with the cgroups here?
> Thanks for any insights.
You can try to use `strace` to call the two applications in question, maybe it
give some hints about their behavior.
- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iE
ease it from 1024.
IIRC this was on the list before, and the only option was to change it in the
source.
Do you face it on an exechost, that the path to the spool directory got to
long? What's called inside the jobscript could be a problem of the shell, but
not of SGE.
-- Reuti
> Am 23.03.2017 um 09:11 schrieb John_Tai :
>
> Can I still download 6.2? Haven't been able to find it.
>
> John
You can make use of the open source update:
http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/
from
https://arc.liv.ac.uk/trac/SGE
-- Reuti
>
. It was a features introduced with SGE 6.2u2:
https://arc.liv.ac.uk/trac/SGE/ticket/197
-- Reuti
>
> Thanks
> John
>
>
>
> -Original Message-
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: Wednesday, December 21, 2016 7:05
> To: Christopher B
> Am 16.03.2017 um 00:50 schrieb Mun Johl :
>
> Hi Reuti,
>
> Thanks for your reply.
>
> I downloaded the sources--GE2011.11p1.tar.gz--from:
> https://sourceforge.net/projects/gridscheduler/files
These are quite old, please have a look here:
https://arc.liv.ac.uk/
ere did you download the OGS or possibly SoGE?
- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iEYEARECAAYFAljJyCsACgkQo/GbGkBRnRoIMQCgycPDhMLGZwopePaLNfc21aLc
n/oAoOAOY2pMvMKRaOODiZR1pbtHHjv7
=4yuu
-END PGP SIGNATURE-
_
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
Am 11.03.2017 um 00:50 schrieb Loren Koenig:
> Hi Reuti, all,
>
>>>> […] I geban to suspect about my PE named "thread", that permits
>>>> use PE with thread programs. Here is it definition: […]
>>
handles the i-th line's request.
Depends on the particular application.
- -- Reuti
>
> Then I get some timeout while trying to qstat something...
>
> [root@ ~]# qstat -u user
> error: failed receiving gdi request response for mid=1 (got syncron
> message receive time
> Am 09.03.2017 um 17:41 schrieb Roberto Nunnari :
>
> On 09.03.2017 15:14, Reuti wrote:
>> Hi,
>>
>>> Am 09.03.2017 um 14:24 schrieb Roberto Nunnari :
>>>
>>> Hi Reuti.
>>> Hi William.
>>>
>>> here&
Hi,
> Am 09.03.2017 um 14:24 schrieb Roberto Nunnari :
>
> Hi Reuti.
> Hi William.
>
> here's my settings you required:
> paramsMONITOR=1
> max_reservation 32
> default_duration 0:10:0
>
>
expected runtime in the job submissions (-l h_rt=…)?
- is a sensible default set in `qconf -msconf` for the runtime
(default_duration 8760:00:00)?
- is a sensible default set in `qconf -msconf` for the number of reservations
(max_reservation 20)?
-- Reuti
signature
he jobs with "-r y" and reschedule them by `qmod
-rj `. While it's waiting again, you can use the `qdel` on the (again)
waiting job. But the jobs will continue on the node although they vanished from
the job list. There were discussions on the list before, that it will need s
> Am 28.02.2017 um 21:49 schrieb Mishkin Derakhshan :
>
> Thank you. I appreciate your patience while I wrap my head around this stuff.
>
> On Sun, Feb 26, 2017 at 12:25 PM, Reuti wrote:
>>
>> Am 26.02.2017 um 22:42 schrieb Mishkin Derakhshan:
>>
>>&g
t; another_user) | mail -s "$2" sge_admin
and define it in:
$ qconf -sconf
…
mailer /usr/sge/cluster/mailer.sh
I even add some information of the job's context (`qsub -ac …`) to the outgoing
mail to the user, e.g. the list of used nodes of a parallel job.
ent to the mailer?
The admin mails are just send, there is no option to define when they are send
or what they contain.
Do you want to get more information or additional entries?
- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iEYEARECAAYFAlixXjIACgkQ
Hi,
> Am 23.02.2017 um 14:50 schrieb Szoke Igor :
>
> Hi, thanks Reuti.
>
> I just want to log job stdout / stderr to some nice job related directories,
> which I cannot create in prior as I have noNFS on the target machine.
How do you get the results of the jobs then? When
which will create the directory during job submission by reading the the
"-e"/"-o" options. Sure, if the user deletes the directory before the job
starts you are out of luck with this approach.
-- Reuti
> Thanks!
>
> -Igor
> __
" therein should reflect this. You can set
additional weights to zero too, in case you don't see the desired effect.
-- Reuti
>
> --Sangmin
>
> On Wed, Feb 22, 2017 at 7:19 PM, Reuti wrote:
> Hi,
>
> > Am 22.02.2017 um 09:41 schrieb Sangmin Park :
> >
ob running
and 10 waiting, and another user has 100 jobs running and due to the policy
also 10 waiting ones?
Usually the fair share police targets only running jobs.
-- Reuti
> I tried to use resource quota policy. But, there is only way to limit the
> number of cores, not the number of
nting file might be
fine, as long as the "accounting_summary FALSE" is set in the PE and there were
`qrsh -inherit …` calls included. But the dates being so far apart and having
different names points to the first cause given.
- -- Reuti
Am 13.02.2017 um 21:17 schrieb Douglas
fferent instance. Note that (depending on your set up) you need different
ports for this one (qmaster and execd) and also set them in the cheap
submission job to the correct values. In case the queue and qmaster for the
"cheap" tasks is on a dedicated machine, this could be
put back into circulation?
You can uncompress it on-the-fly:
$ qacct -f <(zcat /usr/sge/default/common/accounting.0.gz)
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
whatever you prefer).
Whether any PEs are already defined, you can check with: `qconf -spl`
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
> Am 05.01.2017 um 10:04 schrieb Manfred Selz :
>
> Hi Reuti,
>
> thank you for your quick reply.
> Actually, when specifying the host as you suggest - via "-q *@",
> the jobs runs fine. Much obliged for this hint!
>
> And no, there is no reservations/b
esting a queue and a host at the same time, i.e.
"-q" & "-l h=" at the same time. The solution may work also in your case:
request the host by a queue request:
-q "*@node123"
> After all, the final message in the “qstat -j ” report is always:
> cannot run
eboot of an exechost.
But I couldn't see any great improvement compared to a complete NFS share of
/usr/sge.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ight be necessary to adjust
$SGE_ROOT/util/arch, so that also newer Linux kernels are covered.
-- Reuti
> - Copy $SGE_ROOT/default/common/sgeexecd to /etc/init.d
>
> Depending on the startup of services you need either:
>
> # /etc/init.d/sgeexecd start
> # chkconfig
stemctl daemon-reload
# systemctl start sgeexecd.service
# systemctl enable sgeexecd.service
BTW: Is tmpdir in the queue definition just /tmp or do you need an additional
/scratch or alike on the new machine too?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
> > shares=1
> > > childnodes=1
> > > id=1
> > > name=default
> > > type=0
> > > shares=1000
>
> Correct. All users which are collected under this "default" leaf will
> automatically show up there (after you saved the tree and open
location shared? Can you check this in the
job script at execution time:
for LOCATION in ${PATH//:/ }; do if [ ! -d "${LOCATION}" -o ! -r "${LOCATION}"
]; then echo "$LOCATION: missing"; fi ; done
-- Reuti
> -Original Message-
> From: Reuti [mailto:re
t this you may also look into
the option "-terse". The NEXT_JOB_ID (or any other name you prefer) can be
checked in job A by:
qstat -j $JOB_ID | sed -n -e "/^context/s/^context: *//p" | tr "," "\n"
just before i
tted
> # cat virtuoso.o207
> -sh: virtuoso: command not found
You can investigate this by submitting:
$ qsub -V -b y env
What's in the output file "env.o…"?
-- Reuti
>
> What am I missing?
>
> Thanks
> John
> This email (including its attachments, if any
can talk to
all the users directly (to take care of their memory requests), as it will safe
computing time before killing a job because of a small overdraft. In large
clusters in a computing center h_vmem might be the better choice, to avoid that
other users' jobs will be affec
e slave exechosts might still show a too high
value of the available memory I fear.
-- Reuti
> Best,
> Chris
>
> On 12/20/16, 5:11 AM, "users-boun...@gridengine.org on behalf of Reuti"
> wrote:
>
>
>> Am 20.12.2016 um 02:45 schrieb John_Tai :
>>
>
d again by -1 to leave space for
the final 0x00.
One could argue: with -1 in the call you are on the safe side. If this is the
real limit, then also all the Howto examples are wrong as they use a plain
DRMAA_ERROR_STRING_BUFFER as size value.
-- Reuti
> from Dan Templeton's page:
will add this queue to the list of
already set queues. So the one specified in sge_request is still there.
Can you check with `qstat -j ` whether you experience the same?
-- Reuti
> It didn't matter whether we used drmaa with java or c, it would always go to
> the 1-hour queue. I haven'
h must be requested as you suggest, this could even be set to forced
in case no other jobs should slip in.
Or use an RQS, so that certain users have zero slots in other queues.
What problem did you face in combination with drmaa?
-- Reuti
> Scott Lucas
> HPC Applications Support
> 2
ee=10G
10G times 7 = 70 GB
The node has this amount of memory installed and it is defined this way in
`qconf -me ibm037`?
-- Reuti
> mail_list: johnt@ibm005
> notify: FALSE
> job_name: xclock
> jobshare:
Hi,
Am 16.12.2016 um 17:58 schrieb Michael Stauffer:
> On Fri, Dec 16, 2016 at 11:50 AM, Reuti wrote:
>
> > Am 16.12.2016 um 16:15 schrieb Michael Stauffer :
> >
> > SoGE 8.1.8
> >
> > Hi,
> >
> > I'm trying to setup a fair share policy.
Am 17.12.2016 um 11:34 schrieb Reuti:
>
> Am 17.12.2016 um 02:01 schrieb John_Tai:
>
>> It is working!! Thank you to all that replied to me and helped me figure
>> this out.
>>
>> I meant to set the default to 2G so that was my mistake. I changed it t
ough).
If you would have set it up there, it would have been the "overall limit of
memory which can be used in the complete cluster at the same time".
-- Reuti
> # qconf -se global
> hostname global
> load_scaling NONE
> complex_valuesNONE
file "/tmp/qsub/out.task1.o": No such file or
> directory
During execution it looks on the exechost for this paths.
Is the /home shared? Could you create a directory ~/myoutput there? My users
prefer to get the output of their jobs in t
> Am 16.12.2016 um 16:15 schrieb Michael Stauffer :
>
> SoGE 8.1.8
>
> Hi,
>
> I'm trying to setup a fair share policy.
Only functional fair share without honoring the past? 'stckt' are from the
share tree which honors the past usage.
-- Reuti
> I&
ult of 1G or so.
Is there any virtual_free complex defined on a global level: qconf -se global
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
virtual_free to allow jobs to request RAM, so the goal is the each job to
> request both RAM and number of cpu cores. Hopefully this helps figuring out a
> solution. Thanks.
How does the definition of the complex look like in `qconf -sc`?
-- Reuti
> Here’s an example of one host that doe
.50
> load_adjustment_decay_time0:7:30
> load_formula np_load_avg
> schedd_job_info false
The above can be set to true, to get a more detailed output for `qstat -j`
variants.
-- Reuti
> flush_submit_sec 0
> fl
Ls (`man access_list`) and attach it i.e. on a global level, per
PE, queue or exechost to limit access to certain resources to certain users.
-- Reuti
>
> Thanks,
> Jose
> ___
> users mailing l
ad:
job_load_adjustments NONE
load_adjustment_decay_time0:0:0
In your current case of course, where 8 slots are defined and you test with 2
this shouldn't be a problem though.
Did you set up and/or request any memory per machine?
OTOH: if you submit 2 single CPU jobs
nd would reread
either only on a startup or after the time out as you mentioned. So both
observations could be right.
-- Reuti
> In the past I have seen symptoms
> consistent with host_aliases only being read at startup.
> Unfortunately I do not understand SGE's architecture and
&
---
> sim.q@ibm021 BIP 0/0/1 0.02 lx-amd64
Is there any limit of slots in the exechost defined, or in an RQS?
-- Reuti
>
>
> - PENDING JOBS - PEN
to all.q:
>
> qconf -aattr queue pe_list cores all.q
How many "slots" were defined in there queue definition for all.q?
-- Reuti
> - Now I submit a job:
>
> # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock
> Your job 89 ("x
iki.gridengine.info/wiki/index.php/Olesen-FLEXlm-Integration#Ready_for_testing
?
-- Reuti
> So that license of Ansys Application can be monitored using qlicserver
>
>
>
>
>
> -Original Message-
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: Thursday,
ntegration
But as far as I can tell: why do you want to edit the output of `qlicserver`?
Usually this works behind the scenes and has to run as daemon continuously to
adjust the available complexes (i.e. free licenses) in SGE.
-- Reuti
___
users mai
ne/spool/qmaster/messages. So I assume the
> outdated host_aliases confused the grid master.
Usually the host_aliases file is read live, i.e. you can change any entry
therein and it will be honored instantly without a restart. Maybe the order is
important, i.e. in your case to remove t
t, or "delete" the load report from its inbox?
Do you have any custom load sensors defined, either on a global or local level
per exechost? The machine in question was completely removed and shut down?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
> "ip-XXX-XXX-XXX-XXX.eu-west-1.compute.internal" in qstat.
>
> Is there a way I can force it to use the short hostnames (computeXXX) for
> display ?
The usual trick is to list the short nickname in the /etc/hosts file as the
first nam
Having this setup, SGE will track the number of used cores per machine. The
available ones you define in the queue definition. In case you have more than
one queue per exechost, we need to setup in addition an overall limit of cores
which can be used at the same time to avoid oversubscription.
--
llocation_rule $peslots, control_slaves true
qsub -pe orte 16 myjob# allovation_rule $round_robin,
control_slaves tue
where smp resp. orte is the chosen parallel environment for OpenMP resp. Open
MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter to in
Am 29.11.2016 um 20:08 schrieb Coleman, Marcus [JRDUS Non-J&J]:
> Reuti Thanks for the information!!!
>
> Any idea on what is causing the reboot?
There are several possibilities:
- oom-killer (less likely when there are no jobs on the node)
- uncorrectable ECC-error
- heat-pro
> Am 29.11.2016 um 00:17 schrieb Coleman, Marcus [JRDUS Non-J&J]
> :
>
> Reuti
>
> So it rebooted again without any jobs running...and I don't understand "
> sgead...@rndusljpp2.na.jnj.com removed "mcolem19" from user list" but as you
>
Am 28.11.2016 um 20:36 schrieb Coleman, Marcus [JRDUS Non-J&J]:
> Thanks Reuti!
>
> I was hoping it was something thereAny ideas on where to go from here?
What do:
$ ./gethostbyname -all padme
$ ./gethostbyaddr -all 192.168.1.159
show on the node and headno
Am 27.11.2016 um 03:23 schrieb Coleman, Marcus [JRDUS Non-J&J]:
> Hi Reuti
>
> I am not sure what I am looking for...but here is the contents of /tmp on the
> rebooting node
> Any outrights you can see?
>
> [root@padme tmp]# ls -l
> total 20
> prw-rw-r-- 1 mcol
STER log
> 11/25/2016 07:41:27|listen|rndusljpp2|E|commlib error: endpoint is not unique
> error (endpoint "padme/execd/1" is already connected)
> 11/25/2016 07:41:27|listen|rndusljpp2|E|commlib error: got select error
> (Connection reset by peer)
> 11/25/2016 07:41:29|worker|rndusljpp2|I|execd on padme registered
Are there any files in /tmp on the node pointing to a problem starting execd?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
t wouldn't be fixed in the
file, you couldn't keep track which job from the user ran under which GID.
The accounting file may be at the "$SGE_ROOT/default/common/accounting"
location.
-- Reuti
>
> I expected that the usage information of A a
om suspend and resume procedures which
can be defined and even correct any overbooking of licenses, there is no
"look-ahead" feature in SGE. Means, that SGE can't see that the available
licenses with be increased by X if job Y is going to be suspended. So the job
which would l
on socket fd 4
>
> Your interactive job 1756874 has been successfully scheduled.
> timeout (5 s) expired while waiting on socket fd 4
Did you enable any firewall in the cluster to block certain ports on the nodes?
-- Reuti
> This goes for some time, the jobs can even be seen
> […]
>
> Thanks Reuti. This is basically what I've been doing, except that I right out
> the rqs's to a file that I load using qconf. The issue is that temporarily
> changing individual quotas will be much more complicated now.
> I've had an idea though: I c
; usual, then this wouldn't matter, I'm thinking.
I don't know how you change your RQS, but it can also be done on the command
line while it's not explained in detail in the man page:
$ qconf -srqs
{
name foo
description Demo
enabled FALSE
limitn
Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain:
> On Wed, 2 Nov 2016 at 11:13am, Reuti wrote
>
>>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain :
>>>
>>> On our cluster, we have three queues per host, each with as many slots as
>>> th
Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain:
> On Wed, 2 Nov 2016 at 11:13am, Reuti wrote
>
>>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain :
>>>
>>> On our cluster, we have three queues per host, each with as many slots as
>>> th
oGE 8.1.9.
As the load is just the number of eligible processes in the run queue*, it
should for sure get at least up to the number of available cores. Did you
increase the number of slots for these machines too (also PEs)? What is
`uptime` showing? What happens with the reported load, when you
ot; is a RESTRING):
qconf -mattr exechost complex_values versions=_7_7.5_ node_a
qconf -mattr exechost complex_values versions=_6.5_ node_c
qconf -mattr exechost complex_values versions=_7.5_8_ node_d
Then a user could run:
qsub -l "versions=*_7.5_*" ...
Th
7
> slots
Now the job should start, and something else is blocking it, not the number of
free slots.
> I've search on all of the configuration of SGE. I do too the reinstalation of
> the 2 nodes. But the same message appears, that uniquely 7 slots free !
Did you request any r
mber of slots per exechost (in case a host
belongs to both queues), to avoid that a node gets oversubscribed (either by
slots being set per exechost to the number of installed cores or by an RQS, I
set it per exechost to keep RQS resp. `qquota` short).
-- Reuti
> > and I hadn'
p, i.e. the mask in case of named entries.
==
What behavior do you want to achieve? Not which bits are set, but what the
users should be able to do, or not to do?
-- Reuti
>
> So it seems to be ignoring or otherwise overriding the ACL defaults. Does
> anyone have an idea why this
-o`
- insert an additional line (with all usage reading zero) in case the UID in
the list is not in `qacct -o`
-- Reuti
> Thanks,
>
> --
> Ubuntu!
> I am, because you are here.
> ===
Am 19.10.2016 um 21:30 schrieb Michael Stauffer:
> Thanks Reuti, Skylar,
>
> Turns out it was a false alarm, sorry. The user hadn't told me they'd
> submitted to a different queue
Maybe it would be good to tell the user not to submit into a queue at all but
request
resources via qstat -F. The job ran
> immediately. This worked for a few more jobs until resources were truly
> insufficient.
>
> Does anyone have an idea what might be going on or how to continue debugging?
> Thanks.
I noticed such a behav
I've created. I've also
> reduced slot count in the queue to 140, yet the load still runs away to over
> 256 and locks the machine (requiring a hard reboot). Lastly, I've tried
> numerous iterations of my qsub script, but here's my current version of it:
In actual Linu
> Am 05.10.2016 um 10:10 schrieb Mark Dixon :
>
> On Tue, 4 Oct 2016, Reuti wrote:
> ...
>> Yeah, I had the idea of different temporary directories some time ago, as
>> some applications like Molcas need a persistent one across several `mpiruns`
>> on each nod
> Am 04.10.2016 um 17:39 schrieb Mark Dixon :
>
> On Tue, 4 Oct 2016, Reuti wrote:
> ...
>> Do you mean your implementation or the general behavior? The $TMPDIR will be
>> created when a `qrsh -inherit ...` spans a process to a node and is removed
>> once i
201 - 300 of 2402 matches
Mail list logo