Re: [gridengine users] Functional shares autonomously reset to 0 for recently added user

2020-08-25 Thread William Hay
On Mon, Aug 24, 2020 at 08:50:51PM +, Mun Johl wrote: >Hi all, > > > >We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux v6.8 >. > > > >This anomaly isn’t a showstopper by any means, but it has happened enough >that I decided to reach out and

Re: [gridengine users] grid engine check of pending jobs before resuming

2020-08-13 Thread William Hay
On Thu, Aug 13, 2020 at 07:29:32PM +, Derek Stephenson wrote: >HI, > > > >We’re running SoGE 8.1.9 and we’re running into an issue with preemptive >queueing I’m curious if others have had to ever address. We have a >regression queue that is pre-empted by a daily use queue

Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread William Hay
On Mon, May 11, 2020 at 09:39:26PM +, Mun Johl wrote: > Hi William, > > Thank you for your reply. > See my comments below. > > > -Original Message- > > On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > > > Hi William, et al., > > > I am not explicitly setting the DISPLAY--as

Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread William Hay
On Mon, May 11, 2020 at 09:30:14PM +, Mun Johl wrote: > Hi William, et al., > [Mun] Thanks for the tip; I'm still trying to get back to where I can launch > qsrh again. Even after I put the requisite /etc/pam.d/sshd line at the head > of the file I'm still getting the "Your "qrsh" request co

Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread William Hay
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > Hi William, et al., > I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh > -X'. Nor have I done anything to open any additional ports. Again, since > 'ssh -X' is working for us. As a reminder, there is no way

Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread William Hay
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > Hi William, et al., > > Thank you kindly for your response and insight. > Please see my comments below. > > > On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote: > > > [Mun] In order to use ssh -X for our jobs that require an X11 w

Re: [gridengine users] How to export an X11 back to the client?

2020-05-07 Thread William Hay
On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote: > [Mun] In order to use ssh -X for our jobs that require an X11 window to be > pushed to a user's VNC display, I am planning on the following changes. But > please let me know if I have missed something (or everything). > > 1. Update the

Re: [gridengine users] users Digest, Vol 113, Issue 2

2020-05-04 Thread William Hay
On Mon, May 04, 2020 at 09:06:46AM -0400, Korzennik, Sylvain wrote: >We have no problem having jobs w/ X11 enabled, BUT users must use qlogin, >not qsub or qrsh (the way we have configured it). >We have switched from SGE to UGE, but I'm sure the 'issue' is the same, >you need to h

Re: [gridengine users] How to export an X11 back to the client?

2020-05-04 Thread William Hay
On Fri, May 01, 2020 at 06:44:08PM +, Mun Johl wrote: >Hi, > > > >I am using SGE on RHEL6. I am trying to launch a qsub job (a TCL script) >via grid that will result in a GUI application being opened on the >caller’s display (which is a VNC session). Using qsub for this

Re: [gridengine users] qsub -V doesn't set $PATH

2020-04-03 Thread William Hay
On Fri, Apr 03, 2020 at 02:54:19AM +, Shiel, Adam wrote: > I finally had a chance to experiment with this some. > > I think one basic problem was that I had bash as a login shell. Removing bash > from the login shell and specifying "qsub -S /bin/bash " passed my local > PATH to the remot

Re: [gridengine users] Dave Love repository issue

2018-10-17 Thread William Hay
On Tue, Oct 16, 2018 at 06:53:11PM -0500, Jerome wrote: > Dear William > > I'm watching this trac system, and it seem's to be reserved for > developper only.. That's seems that to report a bug, one need to follow > some specifications, which i don't really know... WHere can i read about > this? E

Re: [gridengine users] Dave Love repository issue

2018-10-15 Thread William Hay
On Fri, Oct 12, 2018 at 02:13:32PM -0400, Daniel Povey wrote: >There is an issue tracker here >https://arc.liv.ac.uk/trac >but it's not clear whether Dave Love still has access to it (he moved to The issue tracker has it's own login system. I still have access to it and I've never wor

Re: [gridengine users] cpu usage calculation

2018-08-31 Thread William Hay
On Fri, Aug 31, 2018 at 10:27:39AM +, Marshall2, John (SSC/SPC) wrote: >Hi, >When gridengine calculates cpu usage (based on wallclock) it uses: >cpu usage = wallclock * nslots >This does not account for the number of cpus that may be used for >each slot, which is problematic

Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-06 Thread William Hay
On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote: >HI Reuti, >The prolog script is set to run by root indeed. The xfs quota requires >root privilege. >I also tried the 2nd approach but it seems that the addgrpid file has not >been created when the prolog script execut

Re: [gridengine users] Make All Usersets Their Own Department

2018-07-12 Thread William Hay
On Wed, Jul 11, 2018 at 09:21:10AM -0400, Douglas Duckworth wrote: >Hi >We are running GE 6.2u5 and moving to Slurm. Though before we do some >changes need to be made within GE. >For example we have 66 user sets within our share tree. However none of >them were configured as a

Re: [gridengine users] SGE accounting file getting too big...

2018-05-23 Thread William Hay
On Fri, May 18, 2018 at 05:42:42PM +0200, Reuti wrote: > Note: to read old accouting files in `qacct` on-the-fly you can use: > > $ qacct -o reuti -f <(zcat /usr/sge/default/common/accounting.1.gz) > If you specify the -f flag twice the later one takes precedence. You can therefore easily create

Re: [gridengine users] Clean up old jobs/spooldb?

2018-05-04 Thread William Hay
On Wed, May 02, 2018 at 07:24:39PM -0700, Simon Matthews wrote: > That solution requires working "db_dump" and "db_restore" executables, > which don't appear to be available for the SoGE version. db_dump etc are part of Berkeley DB. You need to match the version against which SoGE is built I thin

[gridengine users] 04/26/2018 11:58:01| main|node-s03a-003|E|shepherd of job 5083806.1 exited with exit status = 28

2018-04-26 Thread William Hay
04/26/2018 11:58:01| main|node-s03a-003|E|shepherd of job 5083806.1 exited with exit status = 28 We had a shepherd exit with the above error code after about 12 hours. As a result it appears not to have run its epilog. This appears to be ENOSPC however we can't see sign of filesystems runnin

Re: [gridengine users] Corrupt user config?

2018-04-17 Thread William Hay
On Tue, Apr 17, 2018 at 08:32:15AM +0100, William Hay wrote: > I've just done an experiment on our dev cluster. Submit job, stop grid > engine, delete the file representing me, restart grid engine. The job > survives and I can then recreate the user with qconf -auser which I >

Re: [gridengine users] Corrupt user config?

2018-04-17 Thread William Hay
On Mon, Apr 16, 2018 at 04:52:33PM +0100, Mark Dixon wrote: > > share-tree only as a tie breaker. But deleting jobs would be bad. Is > > the probably lose any jobs queued something you know from experience? It > > seems odd that we can have jobs queued and running with the running > > qmaster kno

Re: [gridengine users] Corrupt user config?

2018-04-16 Thread William Hay
know from experience? It seems odd that we can have jobs queued and running with the running qmaster knowing nothing of the user but deleting the file would kill them on restart. > > Mark > > On Mon, 16 Apr 2018, William Hay wrote: > > > We had a user report that

[gridengine users] Corrupt user config?

2018-04-16 Thread William Hay
We had a user report that one of their array jobs wasn't scheduling A bit of poking around showed that qconf -suser knew nothing of the user despite them having a queued job. However there was a file in the spool that should have defined the user. Several other users appear to be affected as well

Re: [gridengine users] qstat strange statistic

2018-04-13 Thread William Hay
On Fri, Apr 13, 2018 at 01:54:14PM +0200, leconte j??r??me wrote: > Hello, > I'm using SGE 8.1.9 under debian Stretch > > ?? I have a strange problem. > > ?? when I use qstat , sometime the stats displayed are wrong. Then, I > believe that gridengine doesn't work properly. > > I explain wha

Re: [gridengine users] [SGE-discuss] case-insensitive user names?

2018-04-13 Thread William Hay
On Thu, Apr 12, 2018 at 04:40:03PM -0400, berg...@merctech.com wrote: > We're using SoGE 8.1.6 in an environment where users may login to the > cluster from a Linux workstation (typically using a lower-case login > name) or a Windows desktop, where their login name (as supplied by the > enterprise

Re: [gridengine users] Jobs sitting in queue despite suitable slots and resources available

2018-04-13 Thread William Hay
On Thu, Apr 12, 2018 at 10:15:34AM -0700, Joshua Baker-LePain wrote: > We're running SoGE 8.1.9 on a smallish (but growing) cluster. We've > recently added GPU nodes to the cluster. On each GPU node, a consumable > complex named 'gpu' is defined with the number of GPUs in the node. The > complex

Re: [gridengine users] Job finishes correctly but master is not notified

2018-04-05 Thread William Hay
On Thu, Apr 05, 2018 at 03:38:18PM +0200, Paul Paul wrote: > William, > > Thanks for your reply. > > In the 'messages' file of the exec host, there is nothing (the last message > was 2 weeks ago). Might be worth increasing the loglevel to get more info about what is going on there. William

Re: [gridengine users] Job finishes correctly but master is not notified

2018-04-05 Thread William Hay
On Thu, Apr 05, 2018 at 09:46:23AM +0200, Paul Paul wrote: > Hello, > > We're using SGE 8.1.9 and randomly, we have jobs that finish with success > (our jobs logs confirm this) but the master is not notified. > On the compute, all the folders related to such a job are still here, > correctly fil

Re: [gridengine users] Problems with quotas

2018-04-05 Thread William Hay
On Wed, Mar 28, 2018 at 01:52:59PM +0200, Sms Backup wrote: >Thanks for reply ! >You are rigght, this is systemd unit file. So for filtering I just >use ExecStart=/bin/sh -c /opt/sge/bin/sge_qmaster |&grep -v '^RUE_' ? >Sorry, but I cannot understand this part. I think you probably

Re: [gridengine users] Problems with quotas

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 12:07:39PM +0100, Sms Backup wrote: >Thanks for your replies, >So in total it would be something like this: ExecStart=/bin/sh -c >/opt/sge/bin/sge_qmaster |&grep -v '^RUE_' >&/dev/null ? No. The grep is intended to replace of the redirection to /dev/null so as

Re: [gridengine users] Problems with quotas

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 09:36:29AM +, Mark Dixon wrote: > Hi Jakub, > > That's right: if you need to cut down the logging, one option is to add the > redirection in the start script. > > You're looking for the line starting "sge_qmaster", and you might want to > try adding a ">/dev/null" afte

Re: [gridengine users] Is it possible to nohup a command within a script dispatched via qsub?

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 12:27:48AM +0100, Reuti wrote: > Hi, > > Am 22.03.2018 um 20:51 schrieb Mun Johl: > > > Hi, > > > > I?m using SGE v8.1.9 on RHEL6.8 . In my script that I submit via qsub > > (let?s call it scriptA), I have a gxmessage (gxmessage is similar to > > xmessage, postnote, e

Re: [gridengine users] mpirun without ssh

2018-03-23 Thread William Hay
On Thu, Mar 22, 2018 at 04:29:27PM +0100, leconte j??r??me wrote: > Thank you, > > ?? But I'm not sure to know what I look for. > > ?? If I correctly understand > > ??I must see qrsh or qlogin when I type > > ompi_info > > and if not I must recompile grid_engine with that option > > Best Rega

Re: [gridengine users] qsub in specific nodes

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 11:55:14AM -0300, Dimar Jaime Gonz??lez Soto wrote: >Hi, I need to know how can I execute grid engine in specific hosts. I >tried the follow execution line: >qsub -v NR_PROCESSES=60 -l >h='ubuntu-node2|ubuntu-node11|ubuntu-node12|ubuntu-node13' -b y -j y -t

Re: [gridengine users] Problem with the way environment variables are exported in sge

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 03:23:01PM +, srinivas.chakrava...@wipro.com wrote: > Hi, > > The version in our environment is 2011.11. Looking at https://arc.liv.ac.uk/repos/darcs/sge-release/NEWS it looks like it was fixed in SoGE 8.1.4 which was released in 2013 a couple of years after the release

Re: [gridengine users] Problems with quotas

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 07:59:41AM +0100, Sms Backup wrote: >William, >Thanks for reply. Unfortunately I have few non-interactive queues, so I >cannot limit slots this way. >99% of messages printed to system log look like this below, so I believe >that are the messages which are

Re: [gridengine users] Problems with quotas

2018-03-20 Thread William Hay
On Tue, Mar 20, 2018 at 11:08:02AM +0100, Sms Backup wrote: >Dear all, >We have in our configuration multiple servers assigned to multiple queues. >To limit slots number per system, I tried to create qouta: >{ > name slots > description Limit slots usage per nod

Re: [gridengine users] Problem with the way environment variables are exported in sge

2018-03-20 Thread William Hay
On Mon, Mar 19, 2018 at 12:11:04PM +, srinivas.chakrava...@wipro.com wrote: >Hi, > > > >We have some functions in our environment which are not being parsed >properly by sge, which is causing errors on the stdout while launching >interactive jobs > > > >But wh

Re: [gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-19 Thread William Hay
On Thu, Mar 15, 2018 at 10:19:29PM +, Mun Johl wrote: >Hi, > > > >I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6 >system. The yum command exits with the following error: > > > >Error: Package: gridengine-8.1.9-1.el6.x86_64 >(/gridengi

Re: [gridengine users] shepherd timeout when using qmake and qrsh

2018-03-01 Thread William Hay
On Tue, Feb 27, 2018 at 10:46:57AM +0100, Reuti wrote: > Hi Nils: > > > Am 27.02.2018 um 10:11 schrieb Nils Giordano : > > however you were right: `ssh` is definitively used to access nodes > > (probably on purpose since we have access to several GUI apps). Your > > answer made me check my ~/.ssh/

Re: [gridengine users] Converting from supplemental groups to cgroups for management

2018-02-16 Thread William Hay
On Thu, Feb 15, 2018 at 11:28:58AM -0600, Calvin Dodge wrote: > While the help we received from this and other gridengine lists helped > us resolve the issue of jobs being mysteriously killed, we've been > asked to look into converting the customer's SGE cluster, using > cgroups for job management.

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-09 Thread William Hay
On Thu, Feb 08, 2018 at 03:42:03PM -0800, Joshua Baker-LePain wrote: > 153758 0.51149 tomography USER1 qw02/08/2018 14:03:05 > 192 > 153759 0.0 qss_svk_ge USER2 qw02/08/2018 14:15:06 > 1 1 > 153760 0.00

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-08 Thread William Hay
On Wed, Feb 07, 2018 at 02:15:05PM -0800, Joshua Baker-LePain wrote: > On Wed, 7 Feb 2018 at 12:46am, William Hay wrote > > > IIRC resource quotas and reservations don't always play nicely together. > > The same error can come about for multiple different reasons so havin

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-07 Thread William Hay
On Tue, Feb 06, 2018 at 12:13:24PM -0800, Joshua Baker-LePain wrote: > I'm back again -- is it obvious that my new cluster just went into > production? Again, we're running SoGE 8.1.9 on a cluster with nodes of > several different sizes. We're running into an odd issue where SGE stops > schedulin

Re: [gridengine users] Minimum number of slots

2018-02-01 Thread William Hay
On Thu, Feb 01, 2018 at 11:44:25AM +0100, Ansgar Esztermann-Kirchner wrote: > Now, I think I can improve upon this choice by creating separate > queues for different machines "sizes", i.e. an 8-core queue, a > 20-core queue and so on. However, I do not see a (tractable) way to > enforce proper job-

Re: [gridengine users] gid_range values

2018-01-24 Thread William Hay
On Tue, Jan 23, 2018 at 06:22:28PM -0600, Calvin Dodge wrote: > The docs we've found say that gid_range must be greater than the > number of jobs expected to run currently on one host. > > Our recent experience suggests that it has to be greater than the > total number of jobs in the queue. If it

Re: [gridengine users] Exporting environment variables using -V doesn't work from RHEL7 to RHEL6

2018-01-19 Thread William Hay
On Fri, Jan 05, 2018 at 08:02:18AM +, srinivas.chakrava...@wipro.com wrote: >Hi, > > > >We have recently upgraded one submit host from RHEL6.7 to RHEL7.2. > >Most of our grid execution servers are RHEL6.7, with a few RHEL6.3 >servers. When we run any jobs by using "-V" o

[gridengine users] Happy new year GridEngine Users

2018-01-18 Thread William Hay
Testing if the users@gridengine.org mailing list works in 2018. signature.asc Description: PGP signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2018-01-18 Thread William Hay
On Fri, Dec 22, 2017 at 05:55:26PM -0500, berg...@merctech.com wrote: > True, but even with that info, there doesn't seem to be any universal > way to tell an arbitrary GPU job which GPU to use -- they all default > to device 0. With Nvidia GPUs we use a prolog script that manipulates lock files t

Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2018-01-18 Thread William Hay
On Mon, Jan 08, 2018 at 06:23:20PM -0500, berg...@merctech.com wrote: > Yeah, I've looked at that, but it brings up the 'accounting problem' > of changing the variable each time a GPU-enabled job begins or ends. Set it in starter_method (and sshd's force command if you want to support qlogin etc).

Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2018-01-18 Thread William Hay
On Fri, Jan 05, 2018 at 10:51:42AM -0500, berg...@merctech.com wrote: > In the message dated: Tue, 02 Jan 2018 09:11:51 +, > The pithy ruminations from William Hay on > qsub unchanged> were: > => On Fri, Dec 22, 2017 at 05:55:26PM -0500, berg...@merctech.com wrote: > =&g

Re: [gridengine users] I'm getting an "Unable to initialize env" error; but our simultaneous ECs should be small

2017-11-23 Thread William Hay
On Wed, Nov 22, 2017 at 09:53:17AM -0800, Mun Johl wrote: >Hi, >Periodically I am seeing the following error: > > Unable to initialize environment because of error: cannot register event > client. Only 100 event clients are allowed in the system > >The error first showed up

Re: [gridengine users] Integration of GPUs into GE2011.11p1

2017-10-31 Thread William Hay
On Mon, Oct 30, 2017 at 09:56:37PM +0530, ANS wrote: >Hi, >Thank you for the detailed info. >But can let me know how can i submit a job using 4 GPUs, 8 cores from >2nodes consisting of 2 GPUs, 4 cores from each node. >Thanks, That's not something the free versions of grid engin

Re: [gridengine users] Integration of GPUs into GE2011.11p1

2017-10-30 Thread William Hay
On Wed, Oct 25, 2017 at 04:59:05PM +0200, Reuti wrote: > Hi, > > > Am 25.10.2017 um 16:06 schrieb ANS : > > > > Hi all, > > > > I am trying to integrate GPUs into my existing cluster with 2 GPUs per > > node. I have gone through few sites and done the following > > > > qconf -mc > > gpu

Re: [gridengine users] running SOGE/execd on Cygwin

2017-10-20 Thread William Hay
On Thu, Oct 19, 2017 at 04:49:40PM -0700, Simon Matthews wrote: > Does anyone have any pointers on running execd on cygwin? > > inst_sge -x fails, because the 'uidgid' command doesn't seem to have been > built. > > I can set the environment variables myself and then start the program, > but the

Re: [gridengine users] Cygwin?

2017-10-16 Thread William Hay
' > libcull.a(pack.o):pack.c:(.text+0x94d): undefined reference to `xdrmem_create' > libcull.a(pack.o):pack.c:(.text+0x959): undefined reference to `xdr_double' > collect2: error: ld returned 1 exit status > make: *** [../libs/sgeobj/Makefile:364: test_sge_object] Erro

Re: [gridengine users] Cygwin?

2017-10-13 Thread William Hay
On Thu, Oct 12, 2017 at 04:12:48PM -0700, System Administrator wrote: > I think it should be part of the ./configure step. If you exported it as an > env variable, then re-run the ./configure part. Or put it at the beginning > of the command, for example: > > CPPFLAGS=-I/usr/include/tirpc ./conf

Re: [gridengine users] Cygwin?

2017-10-10 Thread William Hay
On Mon, Oct 09, 2017 at 07:46:05PM -0700, Simon Matthews wrote: > Is it possible to build SOGE for Cygwin? > > SOGE says it is based on OGS which claimed that it supported Cygwin. > > I only need execd on Cygwin. Qmaster and the GUI tools need only run > under CentOS 6 and 7. > > Simon I don't t

Re: [gridengine users] Max jobs per user

2017-10-10 Thread William Hay
On Sat, Sep 30, 2017 at 02:21:12AM +, John_Tai wrote: >Currently if I set a max job per user in the cluster, a new job will be >rejected if it exceeds the max. > > > >> qrsh > >job rejected: Only 100 jobs are allowed per user (current job count: 264) > > > >Is

Re: [gridengine users] load scaling

2017-09-04 Thread William Hay
On Fri, Sep 01, 2017 at 06:28:43PM +0900, Ueki Hikonuki wrote: > Hi, > > I tried to understand load scaling. But it is still unclear for me. > > Let's assume two hosts. > > hostA very fast machine > hostB regular speed machine > > Even though np_load_avg of hostA is much higher than hostB, > a

Re: [gridengine users] Fonts issue with RHEL 7.3

2017-07-26 Thread William Hay
On Tue, Jul 25, 2017 at 01:23:43AM +, Matt Hohmeister wrote: >When trying to run qmon on RHEL 7.3, I get this. Can someone share which >packages would take care of this? Hopefully one of these pages should sort it out. The last one should definitely do it but is fixing things user by

Re: [gridengine users] complex error

2017-07-26 Thread William Hay
On Tue, Jul 25, 2017 at 12:57:47AM +, John_Tai wrote: >I have configured virtual_free as a requestable resource: > > > >virtual_freememMEMORY <=YES JOB >00 > > > >And it's been working great for months. > > > >

Re: [gridengine users] DISPLAY problem in RHEL6.8

2017-07-18 Thread William Hay
On Tue, Jul 18, 2017 at 09:23:05AM +, John_Tai wrote: >I'm having a DISPLAY issue in RHEL6.8 that I don't have in RHEL5. I am >using SGE6.2u6 > > > >I use VNC to connect to a linux server. By default the DISPLAY is set to >:4.0 and I can start GUI jobs locally: > >

Re: [gridengine users] New installation

2017-07-18 Thread William Hay
On Mon, Jul 17, 2017 at 08:00:31PM +, Matt Hohmeister wrote: > Thank you; this is a big help. :-) > > Along those lines, what do you all suggest for the shared directory? From > these instructions, it *appears* that the best choice is to share out NFS > from the master's /opt/sge/default, ha

Re: [gridengine users] Repeated error message in logs from RQS rules

2017-07-17 Thread William Hay
On Fri, Jul 14, 2017 at 08:36:06AM +, Simon Andrews wrote: >Can anyone shed any light on an error I'm getting repeated thousands of >times in my grid engine messages log. This happens when I have a job >which is submitted and which is stopped from running by an RQS rule I have >

Re: [gridengine users] New installation

2017-07-17 Thread William Hay
On Fri, Jul 14, 2017 at 08:58:59PM +, Matt Hohmeister wrote: >Hello- > > > >First off, please accept my apologies for this post, as I have _never_ >used gridengine before. I have two servers, both running RHEL 7.3, and >both linked to a shared xfs-formatted iSCSI volume a

Re: [gridengine users] Ulimit for max open files

2017-06-28 Thread William Hay
slots job. > > Thanks! > Luis Assuming linux what does sysctl fs.file-max report? William > > On 6/27/17, 4:22 AM, "William Hay" wrote: > > On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote: > >Hi, > > > > > > &g

Re: [gridengine users] Ulimit for max open files

2017-06-27 Thread William Hay
On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote: >Hi, > > > >To increase the max open file, we have set execd_params in qconf -mconf >and also on the OS level: > >execd_params >H_DESCRIPTORS=262144,H_LOCKS=262144,H_MAXPROC=262144 > > > >

Re: [gridengine users] taming qlogin

2017-06-26 Thread William Hay
On Fri, Jun 23, 2017 at 08:24:23AM -0700, Ilya wrote: > Hello, > > I am running 6.2u5 with ssh transport for qlogin (not tight integration) and > users are abusing this service: run jobs for days, abandon their sessions > that stay opened forever, etc. So I want to implement mandatory time limits

Re: [gridengine users] 6.2 Update 5 Patch 3 not available?

2017-06-13 Thread William Hay
On Mon, Jun 12, 2017 at 05:46:46PM -0400, Jeff Blaine wrote: > The Open Grid Scheduler homepage at > http://gridscheduler.sourceforge.net/ says: > > The current bugfix & LTS (Long Term Support) release is version > 6.2 update 5 patch 3 (SGE 6.2u5p3), which is based on Sun Grid > Engine

Re: [gridengine users] Throttling job starts (thundering herd)

2017-03-24 Thread William Hay
On Thu, Feb 16, 2017 at 01:43:47PM -0500, Stuart Barkley wrote: > Is there a way to throttle job starts on Grid Engine (we are using Son > of Grid Engine)? Use a load sensor plus job_load_adjustment. Tweak jobs to request a low load (via sge_request or jsv) or set an alarm on all queues when the l

Re: [gridengine users] John's cores pe (Was: users Digest...)

2017-03-23 Thread William Hay
On Thu, Mar 23, 2017 at 08:11:02AM +, John_Tai wrote: > Can I still download 6.2? Haven't been able to find it. > > John If you're going to upgrade you might as well go all the way to SoGE 8.1.9. William > > -Original Message- > From: Reuti [mailto:re...@staff.uni-marburg.de] > Se

Re: [gridengine users] Make qmaster buffer larger

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 05:20:37PM +0100, Jerome Poitout wrote: > Hello, > > OGS/GE 2011.11p1 > > I have an issue while submitting numerous jobs in a short time (over 300 > - not so much for me...) with -sync y option. It seems that qmaster > cannot handle all the requests and i get huge load on

Re: [gridengine users] qsub and reservation

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 07:29:25PM +0100, Roberto Nunnari wrote: > I don't mean move from node to node.. by moving I mean that something > happens in the scheduler.. that the scheduler reserves a slot for the > pending job requesting reservation.. in the schedule file, I see only lines > with the w

Re: [gridengine users] qsub and reservation

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 02:24:38PM +0100, Roberto Nunnari wrote: > Hi Reuti. > Hi William. > > here's my settings you required: > paramsMONITOR=1 > max_reservation 32 > default_duration 0:10:0 > > I cannot understand how What I see in

Re: [gridengine users] qsub and reservation

2017-03-09 Thread William Hay
On Wed, Mar 08, 2017 at 06:33:23PM +0100, Roberto Nunnari wrote: > Hello. > > I am using Oracle Grid Engine 6.2u7 and have some trouble understanding > reservation (qsub -R y ..). > > I'm trying to use this because of big jobs starving because of queues always > full of smaller jobs.. > > Appare

Re: [gridengine users] limtation the number of submission job in queue waiting list

2017-02-23 Thread William Hay
On Thu, Feb 23, 2017 at 09:30:20AM +0900, Sangmin Park wrote: >Yes, it is. >I can handle the number of running jobs using resource quota policy. >However, the number of queue waiting jobs can't. >Basic rule is FIFO, so if one user submits hundre of jobs, another user >has to wa

Re: [gridengine users] making certain jobs or queues not count for tickets..

2017-02-15 Thread William Hay
On Wed, Feb 15, 2017 at 12:34:08AM +0200, Ben Daniel Pere wrote: >The suggestion sounds good but I'm not sure I understand step 1 - if it's >going to be assigned by project and not user - can I still have "powerful" >users in my normal default project that have more tickets there? >

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Tue, Feb 14, 2017 at 12:40:38PM +, Mark Dixon wrote: > To do this, we've had the concept of a node_type for some years now - a > requestable complex, taking the value of something looking like > "24core-128G". It's turned out to be pretty useful. We also have nodetypes ours have single lett

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Tue, Feb 14, 2017 at 12:22:59PM +, Mark Dixon wrote: > On Tue, 14 Feb 2017, William Hay wrote: > ... > >We tweak the permissions on the device nodes from a privileged prolog but > >otherwise I suspect we're doing something similar. > > Hi William, > >

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote: > Hi, > > I've been playing with allocating GPUs using gridengine and am wondering if > I'm trying to make it too complicated. > > We have some 24 core, 128G RAM machines, each with two K80 GPU cards in > them. I have a little client/ser

Re: [gridengine users] GE 6.2u5 Duplicate Job IDs

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:17:29PM -0500, Douglas Duckworth wrote: >Hello >About a month ago we recently started seeing duplicate job in SGE. >For example: >sysadmin@panda2[~]$ qacct -j 878815 >== >qnamestan

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote: > Hi, > > I've been playing with allocating GPUs using gridengine and am wondering if > I'm trying to make it too complicated. > > We have some 24 core, 128G RAM machines, each with two K80 GPU cards in > them. I have a little client/ser

Re: [gridengine users] SGE 6.1 binaries

2017-02-13 Thread William Hay
On Mon, Feb 13, 2017 at 10:20:51AM +0100, Julien Nicoulaud wrote: >Hi all, >I'm looking for Sun GridEngine 6.1 binaries (or sources) for some backward >compatibility testing, and I can't find it anywhere on the web: > * ge-6.1u5-common.tar.gz > * ge-6.1u5-bin-lx24-amd64.tar.gz >

Re: [gridengine users] making certain jobs or queues not count for tickets..

2017-02-08 Thread William Hay
On Wed, Jan 18, 2017 at 05:17:15PM +0200, Ben Daniel Pere wrote: >Hi guys, >Is there a way to make a certain queue (or maybe even certain jobs upon >submission) not count for the tickets a user is running? >We recently started having a very "cheap" tasks that basically wait for >

Re: [gridengine users] possible to match resource request against list of values in a complex?

2016-11-01 Thread William Hay
On Mon, Oct 31, 2016 at 03:08:53PM -0400, berg...@merctech.com wrote: > In the message dated: Mon, 31 Oct 2016 09:56:58 -, > The pithy ruminations from William Hay on > valu > es in a complex?> were: > => On Sat, Oct 29, 2016 at 12:16:50AM +0200, Reuti wrote: > =&

Re: [gridengine users] possible to match resource request against list of values in a complex?

2016-10-31 Thread William Hay
On Sat, Oct 29, 2016 at 12:16:50AM +0200, Reuti wrote: > Hi, > > Am 28.10.2016 um 22:59 schrieb berg...@merctech.com: > > Then a user could run: > > > > qsub -l foobar=7.5 > > What about the opposite way (when "versions" is a RESTRING): > > qconf -mattr exechost complex_values ver

Re: [gridengine users] error qlogin_starter sent: 137 during qrsh

2016-10-13 Thread William Hay
On Thu, Oct 13, 2016 at 11:39:15AM +, Duje Drazin wrote: > Hi William, > > Sorry, I didn't catch your answer, how to "Check the nodes involved for a > firewall/packet filter" > Assuming this is a linux box then on a worker node of the cluster try running iptables -L to see if it has an ipta

Re: [gridengine users] error qlogin_starter sent: 137 during qrsh

2016-10-13 Thread William Hay
On Thu, Oct 13, 2016 at 07:21:33AM +, Duje Drazin wrote: >Hi all, > > > >I have configured following: > > > >qlogin_command telnet > >qlogin_daemon/usr/sbin/in.telnetd > >rlogin_command /usr/bin/ssh -X > >rlogi

Re: [gridengine users] SoGE 8.1.8 - Very slow schedule time qw ==> r taking 30-60 sec.

2016-10-11 Thread William Hay
On Mon, Oct 10, 2016 at 02:39:21PM +, Yuri Burmachenko wrote: >We are using SoGE 8.1.8 and since recently approximately 2 months ago our >job schedule time raised up to 30-60 sec. > > >Any tips and advices where to look for the root cause and/or how can we >improve the sit

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-10 Thread William Hay
On Thu, Oct 06, 2016 at 12:47:49PM +0100, Mark Dixon wrote: > On Wed, 5 Oct 2016, William Hay wrote: > ... > >Our prolog and epilog (parallel) ssh into the slave nodes and do the > >equivalent of run-parts on directories full of scripts some of which check > >if they are

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-05 Thread William Hay
On Wed, Oct 05, 2016 at 12:31:52PM +0100, Mark Dixon wrote: > On Wed, 5 Oct 2016, William Hay wrote: > ... > >It was originally head node only so per job until a user requested local > >TMPDIR on each node so historical reasons. > ... > > Hi William, > > What

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-05 Thread William Hay
On Tue, Oct 04, 2016 at 04:51:42PM +0100, Mark Dixon wrote: > On Tue, 4 Oct 2016, William Hay wrote: > ... > >I have a per-job consumable and the TMPDIR filesystem is created on every > >node of the job. We have a (jsv enforced) policy that all multi-node jobs > >have excl

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-04 Thread William Hay
On Tue, Oct 04, 2016 at 09:32:43AM +0100, Mark Dixon wrote: > It'd be interesting for people to share what they've done with parallel > jobs. Rightly or wrongly, I currently have a per-job consumable and the > $TMPDIR is only on the node with the MASTER task. I have a per-job consumable and the T

Re: [gridengine users] access definition in grid

2016-09-26 Thread William Hay
On Tue, Sep 20, 2016 at 08:07:02AM +, sudha.penme...@wipro.com wrote: >Hi, > > > >Regarding access rules in grid, users primary UNIX group should be the one >which is defined in ACL to be able to access. > > > >Would it be possible to configure it such that user ju

Re: [gridengine users] Forcing Grid Engine jobs to error state with exit status other than 0, 99 or 100.

2016-09-15 Thread William Hay
On Wed, Sep 14, 2016 at 08:52:12PM +, Lee, Wayne wrote: > HI William, > > I've performed some tests by submitting a basic shell script which dumps the > environment (i.e. env) and performs either an "exit 0", "exit 99", "exit > 100", "exit 137" other exit status codes.If I set my script

Re: [gridengine users] Forcing Grid Engine jobs to error state with exit status other than 0, 99 or 100.

2016-09-14 Thread William Hay
On Tue, Sep 13, 2016 at 06:52:53PM +, Lee, Wayne wrote: >In the epilog script that I've setup for our jobs, I've attempted to >capture the value of the "exit_status" of a job or job task and if it >isn't 0, 99 or 100, exit the epilog script with an "exit 100". However >this do

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-13 Thread William Hay
On Tue, Sep 13, 2016 at 03:15:19PM +1000, Derrick Lin wrote: >Thanks guys, >I am implementing the solution as outlined by William, except we are using >XFS here, so we are trying to do it by using XFS's project/directory >quota. Will do more testing and see how it goes.. >Cheers

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 01:29:52PM +0200, Reuti wrote: > > > Am 09.09.2016 um 12:52 schrieb William Hay : > > Grid engine doesn't provide a mechanism to pass the resource requests to > > the prolog > > AFAIK so a mechanism to obtain the value is needed. Qstat

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 09:26:53AM +1000, Derrick Lin wrote: >Hi William, >Actually I don't quite get the need of: >2. Our JSV adds an environment variable to the job recording the amount >of disk requested (you could try parsing it out of the job spool but >this is easier). >

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 10:37:13AM +0100, Mark Dixon wrote: > On Thu, 8 Sep 2016, William Hay wrote: > ... > >Remember tmpfs is not a ramdisk but the linux VFS layer without an attempt > >to provide real file system guarantees. It shouldn't be cached any more > >agres

  1   2   3   4   5   6   >