Re: [gridengine users] Bug - max_aj_instances

2017-04-20 Thread Reuti
27;t present in former times. I have no clue when along the way the regression was introduced though. -- Reuti signature.asc Description: Message signed with OpenPGP using GPGMail ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qsub EDA tool

2017-04-19 Thread Reuti
ee it in the > cluster config. There is none. The default for a new queue is /bin/csh, but how often do you create new cluster queues? Most of the time I set "shell_startuup_mode unix_behavior" anyway by hand when creating a new queue. -- Reuti > -Original Message- &g

Re: [gridengine users] Queue dropped because it is full, except it is not

2017-04-19 Thread Reuti
ct I can submit to dsbm04, > which precedes dsbm05. > > I recently upgraded from sge6.1 to sge6.2u6, though I can’t be sure that’s > the only thing that’s changed. How do I even begin to debug this? Did you upgrade all nodes? -- Reuti signature.asc Description: Message signed w

Re: [gridengine users] qsub EDA tool

2017-04-18 Thread Reuti
nt@ibm044: /home/johnt/temp # > -- What is sourced and set in a batch job depends on the shell_start_mode in the queue configuration. It might also be the case, that something is set by /etc/profile or /etc/profile.local which is not

Re: [gridengine users] Are iptable changes required on a CentOS7 installation of Son of Grid?

2017-04-12 Thread Reuti
chost sends them). It's necessary to tell the MPI applications which port range to use and configure them accordingly (and the firewall). IIRC there is also no way to force SGE's `qrsh` to use only certain ports. Maybe if your cluster has essentially such a configuration, you can convi

Re: [gridengine users] Jobs get hanged

2017-04-10 Thread Reuti
> Am 10.04.2017 um 11:34 schrieb sudha.penme...@wipro.com: > > Hi Reuti, > > Yes, We need the core dump. > > The configuration for the queue is > > h_coreINFINITY Maybe there is not enough disk space available? Are they writing the core in the ho

Re: [gridengine users] Jobs get hanged

2017-04-10 Thread Reuti
Do you need the core dump? $ qconf -sq all.q … h_core0 -- Reuti > Am 10.04.2017 um 10:27 schrieb sudha.penme...@wipro.com: > > Hi, > > Some of the Jobs halt in grid when the aborted jobs does not end as the > generation of the core dump gets stuck which

Re: [gridengine users] Fwd: error in parallel run openmpi for gridengine

2017-04-09 Thread Reuti
Am 09.04.2017 um 15:47 schrieb Yong Wu: > Reuti, > Thanks for your reply again! > > >I can assure you, that for me and others it's working. > But it's not working for me. > > >Aha, I only set the $OMP_ROOT/etc/openmpi-mca-params.conf to have an entry &g

Re: [gridengine users] Fwd: error in parallel run openmpi for gridengine

2017-04-09 Thread Reuti
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 09.04.2017 um 11:14 schrieb Yong Wu: > Dear Reuti, > Thank you very much! > The jobname.nodes file is not necessary for parallel ORCA. And my > "mpivars.sh" is also not a problem. > ORCA3.0.3 program is compiled with

Re: [gridengine users] Fwd: error in parallel run openmpi for gridengine

2017-04-08 Thread Reuti
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 BTW: The Open MPI bug you checked already: https://www.mail-archive.com/users@lists.open-mpi.org/msg30824.html - -- Reuti Am 08.04.2017 um 20:42 schrieb Reuti: > Hi, > > Am 07.04.2017 um 16:04 schrieb Yong Wu: > >> Tha

Re: [gridengine users] Fwd: error in parallel run openmpi for gridengine

2017-04-08 Thread Reuti
gt; &>test.log &" Please don't use "&" in the job script to put the job in the background. The job script might end and SGE discovers this an kills all orphaned processes. Also with Torque this shouldn't be necessary. - -- Reuti -BEGIN PGP SIGNATURE-

Re: [gridengine users] error in parallel run openmpi for gridengine

2017-04-07 Thread Reuti
ey don't just pull the information out of a `mpiexec`). > cp ${SGE_O_WORKDIR}/test.inp $tdir > > cd $tdir Side note: In ORCA there seem several types of jobs to exist: - some types of ORCA jobs can compute happily in $TMPDIR using the scratch directory on the nodes (even i

Re: [gridengine users] output file permission

2017-04-05 Thread Reuti
Hi, > Am 05.04.2017 um 09:23 schrieb Lionel SPINELLI : > > Hello Reuti, > > thanks for your response. The problem occured when I had to move the shared > folder where output files are wirtten from a NAS to an other. On the old NAS, > everything was fine. I changed for a

Re: [gridengine users] output file permission

2017-04-04 Thread Reuti
ded in the source. Some users allow others to write to the output files too, or have a distribution where each user is his own group too. > I have tried to set this started method in my queue with the script: Is this effect new in the cluster, or is it a new installation and fails instant

Re: [gridengine users] Linking two SoGE Clusters?

2017-03-31 Thread Reuti
ghly specialized solution. An installation of the plain files starter.sh/terminate.sh won't give a working solution, but I can send them to you in case you are interested in PM. -- Reuti signature.asc Description: Message signed with OpenPGP using GPGMail _

Re: [gridengine users] ogging for details of resources requested per job in SGE version: 2011.12

2017-03-28 Thread Reuti
ting file, it's updated only once > a job is finished and hence, we can't capture chronologically. Correct, but the submission time is recorded in the accounting file too. - -- Reuti > And SGE upgradation is not an option at least for now, as we are looking for > a quick re

Re: [gridengine users] ogging for details of resources requested per job in SGE version: 2011.12

2017-03-28 Thread Reuti
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 BTW: you are still using GS/GE 2011.11? This might work better: https://arc.liv.ac.uk/trac/SGE - -- Reuti Am 28.03.2017 um 19:10 schrieb jeevan.patn...@wipro.com: > Hi, > > > I am searching for a way to get the values of requested r

Re: [gridengine users] ogging for details of resources requested per job in SGE version: 2011.12

2017-03-28 Thread Reuti
7;t make it to the archive. If someone is interested, I can post them here again. - -- Reuti > I found this option MONITOR=1 to be set to params in sge conf and with this, > the required details are being logged in a separate file called schedule as > follows: > > 3544466:100:STAR

Re: [gridengine users] Some commands not generating any ouput when USE_CGROUPS set

2017-03-24 Thread Reuti
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 24.03.2017 um 22:11 schrieb Joshua Baker-LePain: > On Fri, 24 Mar 2017 at 1:03pm, Reuti wrote > >>> Is this expected behavior? Or is something wonky with the cgroups here? >>> Thanks for any insights. > > A

Re: [gridengine users] Some commands not generating any ouput when USE_CGROUPS set

2017-03-24 Thread Reuti
behavior? Or is something wonky with the cgroups here? > Thanks for any insights. You can try to use `strace` to call the two applications in question, maybe it give some hints about their behavior. - -- Reuti -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iE

Re: [gridengine users] Unable to run job

2017-03-24 Thread Reuti
ease it from 1024. IIRC this was on the list before, and the only option was to change it in the source. Do you face it on an exechost, that the path to the spool directory got to long? What's called inside the jobscript could be a problem of the shell, but not of SGE. -- Reuti

Re: [gridengine users] John's cores pe (Was: users Digest...)

2017-03-23 Thread Reuti
> Am 23.03.2017 um 09:11 schrieb John_Tai : > > Can I still download 6.2? Haven't been able to find it. > > John You can make use of the open source update: http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/ from https://arc.liv.ac.uk/trac/SGE -- Reuti >

Re: [gridengine users] John's cores pe (Was: users Digest...)

2017-03-22 Thread Reuti
. It was a features introduced with SGE 6.2u2: https://arc.liv.ac.uk/trac/SGE/ticket/197 -- Reuti > > Thanks > John > > > > -Original Message- > From: Reuti [mailto:re...@staff.uni-marburg.de] > Sent: Wednesday, December 21, 2016 7:05 > To: Christopher B

Re: [gridengine users] Recommended installation instructions for CentOS 7?

2017-03-16 Thread Reuti
> Am 16.03.2017 um 00:50 schrieb Mun Johl : > > Hi Reuti, > > Thanks for your reply. > > I downloaded the sources--GE2011.11p1.tar.gz--from: > https://sourceforge.net/projects/gridscheduler/files These are quite old, please have a look here: https://arc.liv.ac.uk/

Re: [gridengine users] Recommended installation instructions for CentOS 7?

2017-03-15 Thread Reuti
ere did you download the OGS or possibly SoGE? - -- Reuti -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAljJyCsACgkQo/GbGkBRnRoIMQCgycPDhMLGZwopePaLNfc21aLc n/oAoOAOY2pMvMKRaOODiZR1pbtHHjv7 =4yuu -END PGP SIGNATURE- _

Re: [gridengine users] Batch job on interactive queue

2017-03-11 Thread Reuti
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 11.03.2017 um 00:50 schrieb Loren Koenig: > Hi Reuti, all, > >>>> […] I geban to suspect about my PE named "thread", that permits >>>> use PE with thread programs. Here is it definition: […] >>

Re: [gridengine users] Make qmaster buffer larger

2017-03-09 Thread Reuti
handles the i-th line's request. Depends on the particular application. - -- Reuti > > Then I get some timeout while trying to qstat something... > > [root@ ~]# qstat -u user > error: failed receiving gdi request response for mid=1 (got syncron > message receive time

Re: [gridengine users] qsub and reservation

2017-03-09 Thread Reuti
> Am 09.03.2017 um 17:41 schrieb Roberto Nunnari : > > On 09.03.2017 15:14, Reuti wrote: >> Hi, >> >>> Am 09.03.2017 um 14:24 schrieb Roberto Nunnari : >>> >>> Hi Reuti. >>> Hi William. >>> >>> here&

Re: [gridengine users] qsub and reservation

2017-03-09 Thread Reuti
Hi, > Am 09.03.2017 um 14:24 schrieb Roberto Nunnari : > > Hi Reuti. > Hi William. > > here's my settings you required: > paramsMONITOR=1 > max_reservation 32 > default_duration 0:10:0 > >

Re: [gridengine users] qsub and reservation

2017-03-08 Thread Reuti
expected runtime in the job submissions (-l h_rt=…)? - is a sensible default set in `qconf -msconf` for the runtime (default_duration 8760:00:00)? - is a sensible default set in `qconf -msconf` for the number of reservations (max_reservation 20)? -- Reuti signature

Re: [gridengine users] -notify and killing jobs

2017-03-06 Thread Reuti
he jobs with "-r y" and reschedule them by `qmod -rj `. While it's waiting again, you can use the `qdel` on the (again) waiting job. But the jobs will continue on the node although they vanished from the job list. There were discussions on the list before, that it will need s

Re: [gridengine users] how to configure the email messages sent

2017-03-02 Thread Reuti
> Am 28.02.2017 um 21:49 schrieb Mishkin Derakhshan : > > Thank you. I appreciate your patience while I wrap my head around this stuff. > > On Sun, Feb 26, 2017 at 12:25 PM, Reuti wrote: >> >> Am 26.02.2017 um 22:42 schrieb Mishkin Derakhshan: >> >>&g

Re: [gridengine users] how to configure the email messages sent

2017-02-26 Thread Reuti
t; another_user) | mail -s "$2" sge_admin and define it in: $ qconf -sconf … mailer /usr/sge/cluster/mailer.sh I even add some information of the job's context (`qsub -ac …`) to the outgoing mail to the user, e.g. the list of used nodes of a parallel job.

Re: [gridengine users] how to configure the email messages sent

2017-02-25 Thread Reuti
ent to the mailer? The admin mails are just send, there is no option to define when they are send or what they contain. Do you want to get more information or additional entries? - -- Reuti -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAlixXjIACgkQ

Re: [gridengine users] qsub -e and -o to non-existing directory

2017-02-23 Thread Reuti
Hi, > Am 23.02.2017 um 14:50 schrieb Szoke Igor : > > Hi, thanks Reuti. > > I just want to log job stdout / stderr to some nice job related directories, > which I cannot create in prior as I have noNFS on the target machine. How do you get the results of the jobs then? When

Re: [gridengine users] qsub -e and -o to non-existing directory

2017-02-23 Thread Reuti
which will create the directory during job submission by reading the the "-e"/"-o" options. Sure, if the user deletes the directory before the job starts you are out of luck with this approach. -- Reuti > Thanks! > > -Igor > __

Re: [gridengine users] limtation the number of submission job in queue waiting list

2017-02-23 Thread Reuti
" therein should reflect this. You can set additional weights to zero too, in case you don't see the desired effect. -- Reuti > > --Sangmin > > On Wed, Feb 22, 2017 at 7:19 PM, Reuti wrote: > Hi, > > > Am 22.02.2017 um 09:41 schrieb Sangmin Park : > >

Re: [gridengine users] limtation the number of submission job in queue waiting list

2017-02-22 Thread Reuti
ob running and 10 waiting, and another user has 100 jobs running and due to the policy also 10 waiting ones? Usually the fair share police targets only running jobs. -- Reuti > I tried to use resource quota policy. But, there is only way to limit the > number of cores, not the number of

Re: [gridengine users] GE 6.2u5 Duplicate Job IDs

2017-02-13 Thread Reuti
nting file might be fine, as long as the "accounting_summary FALSE" is set in the PE and there were `qrsh -inherit …` calls included. But the dates being so far apart and having different names points to the first cause given. - -- Reuti Am 13.02.2017 um 21:17 schrieb Douglas

Re: [gridengine users] making certain jobs or queues not count for tickets..

2017-01-18 Thread Reuti
fferent instance. Note that (depending on your set up) you need different ports for this one (qmaster and execd) and also set them in the cheap submission job to the correct values. In case the queue and qmaster for the "cheap" tasks is on a dedicated machine, this could be

Re: [gridengine users] rotating accounting files

2017-01-07 Thread Reuti
put back into circulation? You can uncompress it on-the-fly: $ qacct -f <(zcat /usr/sge/default/common/accounting.0.gz) -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] How to specify the core number for the multiple thread program in one queue?

2017-01-05 Thread Reuti
whatever you prefer). Whether any PEs are already defined, you can check with: `qconf -spl` -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Issue with hostname specification and parallel environment - jobs do not start

2017-01-05 Thread Reuti
> Am 05.01.2017 um 10:04 schrieb Manfred Selz : > > Hi Reuti, > > thank you for your quick reply. > Actually, when specifying the host as you suggest - via "-q *@", > the jobs runs fine. Much obliged for this hint! > > And no, there is no reservations/b

Re: [gridengine users] Issue with hostname specification and parallel environment - jobs do not start

2017-01-05 Thread Reuti
esting a queue and a host at the same time, i.e. "-q" & "-l h=" at the same time. The solution may work also in your case: request the host by a queue request: -q "*@node123" > After all, the final message in the “qstat -j ” report is always: > cannot run

Re: [gridengine users] Centos 7 and Gridmaster install

2017-01-04 Thread Reuti
eboot of an exechost. But I couldn't see any great improvement compared to a complete NFS share of /usr/sge. -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Having trouble installing SGE on a new execution host

2017-01-02 Thread Reuti
ight be necessary to adjust $SGE_ROOT/util/arch, so that also newer Linux kernels are covered. -- Reuti > - Copy $SGE_ROOT/default/common/sgeexecd to /etc/init.d > > Depending on the startup of services you need either: > > # /etc/init.d/sgeexecd start > # chkconfig

Re: [gridengine users] Having trouble installing SGE on a new execution host

2017-01-01 Thread Reuti
stemctl daemon-reload # systemctl start sgeexecd.service # systemctl enable sgeexecd.service BTW: Is tmpdir in the queue definition just /tmp or do you need an additional /scratch or alike on the new machine too? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] fairshare ticketing setup

2016-12-22 Thread Reuti
> > shares=1 > > > childnodes=1 > > > id=1 > > > name=default > > > type=0 > > > shares=1000 > > Correct. All users which are collected under this "default" leaf will > automatically show up there (after you saved the tree and open

Re: [gridengine users] qsub EDA tool

2016-12-22 Thread Reuti
location shared? Can you check this in the job script at execution time: for LOCATION in ${PATH//:/ }; do if [ ! -d "${LOCATION}" -o ! -r "${LOCATION}" ]; then echo "$LOCATION: missing"; fi ; done -- Reuti > -Original Message- > From: Reuti [mailto:re

Re: [gridengine users] How to get the exit code of the running command in SGE?

2016-12-22 Thread Reuti
t this you may also look into the option "-terse". The NEXT_JOB_ID (or any other name you prefer) can be checked in job A by: qstat -j $JOB_ID | sed -n -e "/^context/s/^context: *//p" | tr "," "\n" just before i

Re: [gridengine users] qsub EDA tool

2016-12-21 Thread Reuti
tted > # cat virtuoso.o207 > -sh: virtuoso: command not found You can investigate this by submitting: $ qsub -V -b y env What's in the output file "env.o…"? -- Reuti > > What am I missing? > > Thanks > John > This email (including its attachments, if any

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-21 Thread Reuti
can talk to all the users directly (to take care of their memory requests), as it will safe computing time before killing a job because of a small overdraft. In large clusters in a computing center h_vmem might be the better choice, to avoid that other users' jobs will be affec

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-20 Thread Reuti
e slave exechosts might still show a too high value of the available memory I fear. -- Reuti > Best, > Chris > > On 12/20/16, 5:11 AM, "users-boun...@gridengine.org on behalf of Reuti" > wrote: > > >> Am 20.12.2016 um 02:45 schrieb John_Tai : >> >

Re: [gridengine users] default queues

2016-12-20 Thread Reuti
d again by -1 to leave space for the final 0x00. One could argue: with -1 in the call you are on the safe side. If this is the real limit, then also all the Howto examples are wrong as they use a plain DRMAA_ERROR_STRING_BUFFER as size value. -- Reuti > from Dan Templeton's page:

Re: [gridengine users] default queues

2016-12-20 Thread Reuti
will add this queue to the list of already set queues. So the one specified in sge_request is still there. Can you check with `qstat -j ` whether you experience the same? -- Reuti > It didn't matter whether we used drmaa with java or c, it would always go to > the 1-hour queue. I haven'

Re: [gridengine users] default queues

2016-12-20 Thread Reuti
h must be requested as you suggest, this could even be set to forced in case no other jobs should slip in. Or use an RQS, so that certain users have zero slots in other queues. What problem did you face in combination with drmaa? -- Reuti > Scott Lucas > HPC Applications Support > 2

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-20 Thread Reuti
ee=10G 10G times 7 = 70 GB The node has this amount of memory installed and it is defined this way in `qconf -me ibm037`? -- Reuti > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare:

Re: [gridengine users] fairshare ticketing setup

2016-12-17 Thread Reuti
Hi, Am 16.12.2016 um 17:58 schrieb Michael Stauffer: > On Fri, Dec 16, 2016 at 11:50 AM, Reuti wrote: > > > Am 16.12.2016 um 16:15 schrieb Michael Stauffer : > > > > SoGE 8.1.8 > > > > Hi, > > > > I'm trying to setup a fair share policy.

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-17 Thread Reuti
Am 17.12.2016 um 11:34 schrieb Reuti: > > Am 17.12.2016 um 02:01 schrieb John_Tai: > >> It is working!! Thank you to all that replied to me and helped me figure >> this out. >> >> I meant to set the default to 2G so that was my mistake. I changed it t

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-17 Thread Reuti
ough). If you would have set it up there, it would have been the "overall limit of memory which can be used in the complete cluster at the same time". -- Reuti > # qconf -se global > hostname global > load_scaling NONE > complex_valuesNONE

Re: [gridengine users] How to write output to /tmp of the machine in which the job was submitted?

2016-12-17 Thread Reuti
file "/tmp/qsub/out.task1.o": No such file or > directory During execution it looks on the exechost for this paths. Is the /home shared? Could you create a directory ~/myoutput there? My users prefer to get the output of their jobs in t

Re: [gridengine users] fairshare ticketing setup

2016-12-16 Thread Reuti
> Am 16.12.2016 um 16:15 schrieb Michael Stauffer : > > SoGE 8.1.8 > > Hi, > > I'm trying to setup a fair share policy. Only functional fair share without honoring the past? 'stckt' are from the share tree which honors the past usage. -- Reuti > I&

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-16 Thread Reuti
ult of 1G or so. Is there any virtual_free complex defined on a global level: qconf -se global -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] John's cores pe (Was: users Digest...)

2016-12-16 Thread Reuti
virtual_free to allow jobs to request RAM, so the goal is the each job to > request both RAM and number of cpu cores. Hopefully this helps figuring out a > solution. Thanks. How does the definition of the complex look like in `qconf -sc`? -- Reuti > Here’s an example of one host that doe

Re: [gridengine users] users Digest, Vol 72, Issue 13

2016-12-13 Thread Reuti
.50 > load_adjustment_decay_time0:7:30 > load_formula np_load_avg > schedd_job_info false The above can be set to true, to get a more detailed output for `qstat -j` variants. -- Reuti > flush_submit_sec 0 > fl

Re: [gridengine users] move running jobs to another queue

2016-12-12 Thread Reuti
Ls (`man access_list`) and attach it i.e. on a global level, per PE, queue or exechost to limit access to certain resources to certain users. -- Reuti > > Thanks, > Jose > ___ > users mailing l

Re: [gridengine users] users Digest, Vol 72, Issue 13

2016-12-12 Thread Reuti
ad: job_load_adjustments NONE load_adjustment_decay_time0:0:0 In your current case of course, where 8 slots are defined and you test with 2 this shouldn't be a problem though. Did you set up and/or request any memory per machine? OTOH: if you submit 2 single CPU jobs

Re: [gridengine users] How to make grid master "accept" gone hosts?

2016-12-10 Thread Reuti
nd would reread either only on a startup or after the time out as you mentioned. So both observations could be right. -- Reuti > In the past I have seen symptoms > consistent with host_aliases only being read at startup. > Unfortunately I do not understand SGE's architecture and &

Re: [gridengine users] CPU complex

2016-12-09 Thread Reuti
--- > sim.q@ibm021 BIP 0/0/1 0.02 lx-amd64 Is there any limit of slots in the exechost defined, or in an RQS? -- Reuti > > > - PENDING JOBS - PEN

Re: [gridengine users] CPU complex

2016-12-08 Thread Reuti
to all.q: > > qconf -aattr queue pe_list cores all.q How many "slots" were defined in there queue definition for all.q? -- Reuti > - Now I submit a job: > > # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock > Your job 89 ("x

Re: [gridengine users] Regarding flex-grid : How to integrate flexlm for Ansys Application with gridengine

2016-12-08 Thread Reuti
iki.gridengine.info/wiki/index.php/Olesen-FLEXlm-Integration#Ready_for_testing ? -- Reuti > So that license of Ansys Application can be monitored using qlicserver > > > > > > -Original Message- > From: Reuti [mailto:re...@staff.uni-marburg.de] > Sent: Thursday,

Re: [gridengine users] Regarding flex-grid : How to integrate flexlm for Ansys Application with gridengine

2016-12-08 Thread Reuti
ntegration But as far as I can tell: why do you want to edit the output of `qlicserver`? Usually this works behind the scenes and has to run as daemon continuously to adjust the available complexes (i.e. free licenses) in SGE. -- Reuti ___ users mai

Re: [gridengine users] How to make grid master "accept" gone hosts?

2016-12-08 Thread Reuti
ne/spool/qmaster/messages. So I assume the > outdated host_aliases confused the grid master. Usually the host_aliases file is read live, i.e. you can change any entry therein and it will be honored instantly without a restart. Maybe the order is important, i.e. in your case to remove t

Re: [gridengine users] How to make grid master "accept" gone hosts?

2016-12-07 Thread Reuti
t, or "delete" the load report from its inbox? Do you have any custom load sensors defined, either on a global or local level per exechost? The machine in question was completely removed and shut down? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Is there a way to make SGE use the short instead of the fully qualified host names ?

2016-12-06 Thread Reuti
> "ip-XXX-XXX-XXX-XXX.eu-west-1.compute.internal" in qstat. > > Is there a way I can force it to use the short hostnames (computeXXX) for > display ? The usual trick is to list the short nickname in the /etc/hosts file as the first nam

Re: [gridengine users] CPU complex

2016-12-05 Thread Reuti
Having this setup, SGE will track the number of used cores per machine. The available ones you define in the queue definition. In case you have more than one queue per exechost, we need to setup in addition an overall limit of cores which can be used at the same time to avoid oversubscription. --

Re: [gridengine users] CPU complex

2016-12-05 Thread Reuti
llocation_rule $peslots, control_slaves true qsub -pe orte 16 myjob# allovation_rule $round_robin, control_slaves tue where smp resp. orte is the chosen parallel environment for OpenMP resp. Open MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter to in

Re: [gridengine users] commlib

2016-11-29 Thread Reuti
Am 29.11.2016 um 20:08 schrieb Coleman, Marcus [JRDUS Non-J&J]: > Reuti Thanks for the information!!! > > Any idea on what is causing the reboot? There are several possibilities: - oom-killer (less likely when there are no jobs on the node) - uncorrectable ECC-error - heat-pro

Re: [gridengine users] commlib

2016-11-29 Thread Reuti
> Am 29.11.2016 um 00:17 schrieb Coleman, Marcus [JRDUS Non-J&J] > : > > Reuti > > So it rebooted again without any jobs running...and I don't understand " > sgead...@rndusljpp2.na.jnj.com removed "mcolem19" from user list" but as you >

Re: [gridengine users] commlib

2016-11-28 Thread Reuti
Am 28.11.2016 um 20:36 schrieb Coleman, Marcus [JRDUS Non-J&J]: > Thanks Reuti! > > I was hoping it was something thereAny ideas on where to go from here? What do: $ ./gethostbyname -all padme $ ./gethostbyaddr -all 192.168.1.159 show on the node and headno

Re: [gridengine users] commlib

2016-11-27 Thread Reuti
Am 27.11.2016 um 03:23 schrieb Coleman, Marcus [JRDUS Non-J&J]: > Hi Reuti > > I am not sure what I am looking for...but here is the contents of /tmp on the > rebooting node > Any outrights you can see? > > [root@padme tmp]# ls -l > total 20 > prw-rw-r-- 1 mcol

Re: [gridengine users] commlib

2016-11-26 Thread Reuti
STER log > 11/25/2016 07:41:27|listen|rndusljpp2|E|commlib error: endpoint is not unique > error (endpoint "padme/execd/1" is already connected) > 11/25/2016 07:41:27|listen|rndusljpp2|E|commlib error: got select error > (Connection reset by peer) > 11/25/2016 07:41:29|worker|rndusljpp2|I|execd on padme registered Are there any files in /tmp on the node pointing to a problem starting execd? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] where does qacct -g option refer to

2016-11-25 Thread Reuti
t wouldn't be fixed in the file, you couldn't keep track which job from the user ran under which GID. The accounting file may be at the "$SGE_ROOT/default/common/accounting" location. -- Reuti > > I expected that the usage information of A a

Re: [gridengine users] Abaqus job suspension & Olesen FlexLM integration

2016-11-22 Thread Reuti
om suspend and resume procedures which can be defined and even correct any overbooking of licenses, there is no "look-ahead" feature in SGE. Means, that SGE can't see that the available licenses with be increased by X if job Y is going to be suspended. So the job which would l

Re: [gridengine users] issue with qrsh "waiting on socket fd 4" in SGE 6.2u5

2016-11-15 Thread Reuti
on socket fd 4 > > Your interactive job 1756874 has been successfully scheduled. > timeout (5 s) expired while waiting on socket fd 4 Did you enable any firewall in the cluster to block certain ports on the nodes? -- Reuti > This goes for some time, the jobs can even be seen

Re: [gridengine users] a way to selectively run queued jobs?

2016-11-03 Thread Reuti
> […] > > Thanks Reuti. This is basically what I've been doing, except that I right out > the rqs's to a file that I load using qconf. The issue is that temporarily > changing individual quotas will be much more complicated now. > I've had an idea though: I c

Re: [gridengine users] a way to selectively run queued jobs?

2016-11-02 Thread Reuti
; usual, then this wouldn't matter, I'm thinking. I don't know how you change your RQS, but it can also be done on the command line while it's not explained in detail in the man page: $ qconf -srqs { name foo description Demo enabled FALSE limitn

Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

2016-11-02 Thread Reuti
Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain: > On Wed, 2 Nov 2016 at 11:13am, Reuti wrote > >>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain : >>> >>> On our cluster, we have three queues per host, each with as many slots as >>> th

Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

2016-11-02 Thread Reuti
Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain: > On Wed, 2 Nov 2016 at 11:13am, Reuti wrote > >>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain : >>> >>> On our cluster, we have three queues per host, each with as many slots as >>> th

Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

2016-11-02 Thread Reuti
oGE 8.1.9. As the load is just the number of eligible processes in the run queue*, it should for sure get at least up to the number of available cores. Did you increase the number of slots for these machines too (also PEs)? What is `uptime` showing? What happens with the reported load, when you

Re: [gridengine users] possible to match resource request against list of values in a complex?

2016-10-28 Thread Reuti
ot; is a RESTRING): qconf -mattr exechost complex_values versions=_7_7.5_ node_a qconf -mattr exechost complex_values versions=_6.5_ node_c qconf -mattr exechost complex_values versions=_7.5_8_ node_d Then a user could run: qsub -l "versions=*_7.5_*" ... Th

Re: [gridengine users] Strange issue with one node

2016-10-24 Thread Reuti
7 > slots Now the job should start, and something else is blocking it, not the number of free slots. > I've search on all of the configuration of SGE. I do too the reinstalation of > the 2 nodes. But the same message appears, that uniquely 7 slots free ! Did you request any r

Re: [gridengine users] jobs not running even though resource quotas not met

2016-10-21 Thread Reuti
mber of slots per exechost (in case a host belongs to both queues), to avoid that a node gets oversubscribed (either by slots being set per exechost to the number of installed cores or by an RQS, I set it per exechost to keep RQS resp. `qquota` short). -- Reuti > > and I hadn'

Re: [gridengine users] stdio permissions are ignoring ACL defaults

2016-10-21 Thread Reuti
p, i.e. the mask in case of named entries. == What behavior do you want to achieve? Not which bits are set, but what the users should be able to do, or not to do? -- Reuti > > So it seems to be ignoring or otherwise overriding the ACL defaults. Does > anyone have an idea why this

Re: [gridengine users] qacct not display account if not use resource

2016-10-21 Thread Reuti
-o` - insert an additional line (with all usage reading zero) in case the UID in the list is not in `qacct -o` -- Reuti > Thanks, > > -- > Ubuntu! > I am, because you are here. > ===

Re: [gridengine users] jobs not running even though resource quotas not met

2016-10-19 Thread Reuti
Am 19.10.2016 um 21:30 schrieb Michael Stauffer: > Thanks Reuti, Skylar, > > Turns out it was a false alarm, sorry. The user hadn't told me they'd > submitted to a different queue Maybe it would be good to tell the user not to submit into a queue at all but request

Re: [gridengine users] jobs not running even though resource quotas not met

2016-10-19 Thread Reuti
resources via qstat -F. The job ran > immediately. This worked for a few more jobs until resources were truly > insufficient. > > Does anyone have an idea what might be going on or how to continue debugging? > Thanks. I noticed such a behav

Re: [gridengine users] sge uv2000 openmp crash

2016-10-17 Thread Reuti
I've created. I've also > reduced slot count in the queue to 140, yet the load still runs away to over > 256 and locks the machine (requiring a hard reboot). Lastly, I've tried > numerous iterations of my qsub script, but here's my current version of it: In actual Linu

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-05 Thread Reuti
> Am 05.10.2016 um 10:10 schrieb Mark Dixon : > > On Tue, 4 Oct 2016, Reuti wrote: > ... >> Yeah, I had the idea of different temporary directories some time ago, as >> some applications like Molcas need a persistent one across several `mpiruns` >> on each nod

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-04 Thread Reuti
> Am 04.10.2016 um 17:39 schrieb Mark Dixon : > > On Tue, 4 Oct 2016, Reuti wrote: > ... >> Do you mean your implementation or the general behavior? The $TMPDIR will be >> created when a `qrsh -inherit ...` spans a process to a node and is removed >> once i

<    1   2   3   4   5   6   7   8   9   10   >