Re: [gridengine users] Functional shares autonomously reset to 0 for recently added user

2020-08-25 Thread William Hay
On Mon, Aug 24, 2020 at 08:50:51PM +, Mun Johl wrote: >Hi all, > > > >We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux v6.8 >. > > > >This anomaly isn’t a showstopper by any means, but it has happened enough >that I decided to reach out

Re: [gridengine users] grid engine check of pending jobs before resuming

2020-08-14 Thread William Hay
On Thu, Aug 13, 2020 at 07:29:32PM +, Derek Stephenson wrote: >HI, > > > >We’re running SoGE 8.1.9 and we’re running into an issue with preemptive >queueing I’m curious if others have had to ever address. We have a >regression queue that is pre-empted by a daily use

Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread William Hay
On Mon, May 11, 2020 at 09:39:26PM +, Mun Johl wrote: > Hi William, > > Thank you for your reply. > See my comments below. > > > -Original Message- > > On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > > > Hi William, et al., > > > I am not explicitly setting the

Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread William Hay
On Mon, May 11, 2020 at 09:30:14PM +, Mun Johl wrote: > Hi William, et al., > [Mun] Thanks for the tip; I'm still trying to get back to where I can launch > qsrh again. Even after I put the requisite /etc/pam.d/sshd line at the head > of the file I'm still getting the "Your "qrsh" request

Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread William Hay
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > Hi William, et al., > I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh > -X'. Nor have I done anything to open any additional ports. Again, since > 'ssh -X' is working for us. As a reminder, there is no way

Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread William Hay
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote: > Hi William, et al., > > Thank you kindly for your response and insight. > Please see my comments below. > > > On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote: > > > [Mun] In order to use ssh -X for our jobs that require an X11

Re: [gridengine users] How to export an X11 back to the client?

2020-05-07 Thread William Hay
On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote: > [Mun] In order to use ssh -X for our jobs that require an X11 window to be > pushed to a user's VNC display, I am planning on the following changes. But > please let me know if I have missed something (or everything). > > 1. Update

Re: [gridengine users] users Digest, Vol 113, Issue 2

2020-05-04 Thread William Hay
On Mon, May 04, 2020 at 09:06:46AM -0400, Korzennik, Sylvain wrote: >We have no problem having jobs w/ X11 enabled, BUT users must use qlogin, >not qsub or qrsh (the way we have configured it). >We have switched from SGE to UGE, but I'm sure the 'issue' is the same, >you need to

Re: [gridengine users] How to export an X11 back to the client?

2020-05-04 Thread William Hay
On Fri, May 01, 2020 at 06:44:08PM +, Mun Johl wrote: >Hi, > > > >I am using SGE on RHEL6. I am trying to launch a qsub job (a TCL script) >via grid that will result in a GUI application being opened on the >caller’s display (which is a VNC session). Using qsub for this

Re: [gridengine users] qsub -V doesn't set $PATH

2020-04-03 Thread William Hay
On Fri, Apr 03, 2020 at 02:54:19AM +, Shiel, Adam wrote: > I finally had a chance to experiment with this some. > > I think one basic problem was that I had bash as a login shell. Removing bash > from the login shell and specifying "qsub -S /bin/bash " passed my local > PATH to the

Re: [gridengine users] Dave Love repository issue

2018-10-17 Thread William Hay
On Tue, Oct 16, 2018 at 06:53:11PM -0500, Jerome wrote: > Dear William > > I'm watching this trac system, and it seem's to be reserved for > developper only.. That's seems that to report a bug, one need to follow > some specifications, which i don't really know... WHere can i read about > this?

Re: [gridengine users] Dave Love repository issue

2018-10-15 Thread William Hay
On Fri, Oct 12, 2018 at 02:13:32PM -0400, Daniel Povey wrote: >There is an issue tracker here >https://arc.liv.ac.uk/trac >but it's not clear whether Dave Love still has access to it (he moved to The issue tracker has it's own login system. I still have access to it and I've never

Re: [gridengine users] cpu usage calculation

2018-08-31 Thread William Hay
On Fri, Aug 31, 2018 at 10:27:39AM +, Marshall2, John (SSC/SPC) wrote: >Hi, >When gridengine calculates cpu usage (based on wallclock) it uses: >cpu usage = wallclock * nslots >This does not account for the number of cpus that may be used for >each slot, which is

Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-06 Thread William Hay
On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote: >HI Reuti, >The prolog script is set to run by root indeed. The xfs quota requires >root privilege. >I also tried the 2nd approach but it seems that the addgrpid file has not >been created when the prolog script

Re: [gridengine users] Make All Usersets Their Own Department

2018-07-12 Thread William Hay
On Wed, Jul 11, 2018 at 09:21:10AM -0400, Douglas Duckworth wrote: >Hi >We are running GE 6.2u5 and moving to Slurm. Though before we do some >changes need to be made within GE. >For example we have 66 user sets within our share tree. However none of >them were configured as

Re: [gridengine users] SGE accounting file getting too big...

2018-05-23 Thread William Hay
On Fri, May 18, 2018 at 05:42:42PM +0200, Reuti wrote: > Note: to read old accouting files in `qacct` on-the-fly you can use: > > $ qacct -o reuti -f <(zcat /usr/sge/default/common/accounting.1.gz) > If you specify the -f flag twice the later one takes precedence. You can therefore easily create

Re: [gridengine users] Clean up old jobs/spooldb?

2018-05-04 Thread William Hay
On Wed, May 02, 2018 at 07:24:39PM -0700, Simon Matthews wrote: > That solution requires working "db_dump" and "db_restore" executables, > which don't appear to be available for the SoGE version. db_dump etc are part of Berkeley DB. You need to match the version against which SoGE is built I

[gridengine users] 04/26/2018 11:58:01| main|node-s03a-003|E|shepherd of job 5083806.1 exited with exit status = 28

2018-04-26 Thread William Hay
04/26/2018 11:58:01| main|node-s03a-003|E|shepherd of job 5083806.1 exited with exit status = 28 We had a shepherd exit with the above error code after about 12 hours. As a result it appears not to have run its epilog. This appears to be ENOSPC however we can't see sign of filesystems

Re: [gridengine users] Corrupt user config?

2018-04-17 Thread William Hay
On Mon, Apr 16, 2018 at 04:52:33PM +0100, Mark Dixon wrote: > > share-tree only as a tie breaker. But deleting jobs would be bad. Is > > the probably lose any jobs queued something you know from experience? It > > seems odd that we can have jobs queued and running with the running > > qmaster

Re: [gridengine users] Corrupt user config?

2018-04-16 Thread William Hay
bs queued and running with the running qmaster knowing nothing of the user but deleting the file would kill them on restart. > > Mark > > On Mon, 16 Apr 2018, William Hay wrote: > > > We had a user report that one of their array jobs wasn't scheduling A > > b

[gridengine users] Corrupt user config?

2018-04-16 Thread William Hay
We had a user report that one of their array jobs wasn't scheduling A bit of poking around showed that qconf -suser knew nothing of the user despite them having a queued job. However there was a file in the spool that should have defined the user. Several other users appear to be affected as

Re: [gridengine users] qstat strange statistic

2018-04-13 Thread William Hay
On Fri, Apr 13, 2018 at 01:54:14PM +0200, leconte j??r??me wrote: > Hello, > I'm using SGE 8.1.9 under debian Stretch > > ?? I have a strange problem. > > ?? when I use qstat , sometime the stats displayed are wrong. Then, I > believe that gridengine doesn't work properly. > > I explain

Re: [gridengine users] [SGE-discuss] case-insensitive user names?

2018-04-13 Thread William Hay
On Thu, Apr 12, 2018 at 04:40:03PM -0400, berg...@merctech.com wrote: > We're using SoGE 8.1.6 in an environment where users may login to the > cluster from a Linux workstation (typically using a lower-case login > name) or a Windows desktop, where their login name (as supplied by the > enterprise

Re: [gridengine users] Job finishes correctly but master is not notified

2018-04-05 Thread William Hay
On Thu, Apr 05, 2018 at 03:38:18PM +0200, Paul Paul wrote: > William, > > Thanks for your reply. > > In the 'messages' file of the exec host, there is nothing (the last message > was 2 weeks ago). Might be worth increasing the loglevel to get more info about what is going on there. William

Re: [gridengine users] Job finishes correctly but master is not notified

2018-04-05 Thread William Hay
On Thu, Apr 05, 2018 at 09:46:23AM +0200, Paul Paul wrote: > Hello, > > We're using SGE 8.1.9 and randomly, we have jobs that finish with success > (our jobs logs confirm this) but the master is not notified. > On the compute, all the folders related to such a job are still here, > correctly

Re: [gridengine users] Problems with quotas

2018-04-05 Thread William Hay
On Wed, Mar 28, 2018 at 01:52:59PM +0200, Sms Backup wrote: >Thanks for reply ! >You are rigght, this is systemd unit file. So for filtering I just >use ExecStart=/bin/sh -c /opt/sge/bin/sge_qmaster | -v '^RUE_' ? >Sorry, but I cannot understand this part. I think you probably

Re: [gridengine users] Problems with quotas

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 12:07:39PM +0100, Sms Backup wrote: >Thanks for your replies, >So in total it would be something like this: ExecStart=/bin/sh -c >/opt/sge/bin/sge_qmaster | -v '^RUE_' >&/dev/null ? No. The grep is intended to replace of the redirection to /dev/null so as to

Re: [gridengine users] Problems with quotas

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 09:36:29AM +, Mark Dixon wrote: > Hi Jakub, > > That's right: if you need to cut down the logging, one option is to add the > redirection in the start script. > > You're looking for the line starting "sge_qmaster", and you might want to > try adding a ">/dev/null"

Re: [gridengine users] Is it possible to nohup a command within a script dispatched via qsub?

2018-03-23 Thread William Hay
On Fri, Mar 23, 2018 at 12:27:48AM +0100, Reuti wrote: > Hi, > > Am 22.03.2018 um 20:51 schrieb Mun Johl: > > > Hi, > > > > I?m using SGE v8.1.9 on RHEL6.8 . In my script that I submit via qsub > > (let?s call it scriptA), I have a gxmessage (gxmessage is similar to > > xmessage, postnote,

Re: [gridengine users] mpirun without ssh

2018-03-23 Thread William Hay
On Thu, Mar 22, 2018 at 04:29:27PM +0100, leconte j??r??me wrote: > Thank you, > > ?? But I'm not sure to know what I look for. > > ?? If I correctly understand > > ??I must see qrsh or qlogin when I type > > ompi_info > > and if not I must recompile grid_engine with that option > > Best

Re: [gridengine users] qsub in specific nodes

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 11:55:14AM -0300, Dimar Jaime Gonz??lez Soto wrote: >Hi, I need to know how can I execute grid engine in specific hosts. I >tried the follow execution line: >qsub -v NR_PROCESSES=60 -l >h='ubuntu-node2|ubuntu-node11|ubuntu-node12|ubuntu-node13' -b y -j y

Re: [gridengine users] Problem with the way environment variables are exported in sge

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 03:23:01PM +, srinivas.chakrava...@wipro.com wrote: > Hi, > > The version in our environment is 2011.11. Looking at https://arc.liv.ac.uk/repos/darcs/sge-release/NEWS it looks like it was fixed in SoGE 8.1.4 which was released in 2013 a couple of years after the

Re: [gridengine users] Problems with quotas

2018-03-21 Thread William Hay
On Wed, Mar 21, 2018 at 07:59:41AM +0100, Sms Backup wrote: >William, >Thanks for reply. Unfortunately I have few non-interactive queues, so I >cannot limit slots this way. >99% of messages printed to system log look like this below, so I believe >that are the messages which

Re: [gridengine users] Problem with the way environment variables are exported in sge

2018-03-20 Thread William Hay
On Mon, Mar 19, 2018 at 12:11:04PM +, srinivas.chakrava...@wipro.com wrote: >Hi, > > > >We have some functions in our environment which are not being parsed >properly by sge, which is causing errors on the stdout while launching >interactive jobs > > > >But

Re: [gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-19 Thread William Hay
On Thu, Mar 15, 2018 at 10:19:29PM +, Mun Johl wrote: >Hi, > > > >I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6 >system. The yum command exits with the following error: > > > >Error: Package: gridengine-8.1.9-1.el6.x86_64 >

Re: [gridengine users] shepherd timeout when using qmake and qrsh

2018-03-01 Thread William Hay
On Tue, Feb 27, 2018 at 10:46:57AM +0100, Reuti wrote: > Hi Nils: > > > Am 27.02.2018 um 10:11 schrieb Nils Giordano : > > however you were right: `ssh` is definitively used to access nodes > > (probably on purpose since we have access to several GUI apps). Your > > answer

Re: [gridengine users] Converting from supplemental groups to cgroups for management

2018-02-16 Thread William Hay
On Thu, Feb 15, 2018 at 11:28:58AM -0600, Calvin Dodge wrote: > While the help we received from this and other gridengine lists helped > us resolve the issue of jobs being mysteriously killed, we've been > asked to look into converting the customer's SGE cluster, using > cgroups for job

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-09 Thread William Hay
On Thu, Feb 08, 2018 at 03:42:03PM -0800, Joshua Baker-LePain wrote: > 153758 0.51149 tomography USER1 qw02/08/2018 14:03:05 > 192 > 153759 0.0 qss_svk_ge USER2 qw02/08/2018 14:15:06 > 1 1 > 153760

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-08 Thread William Hay
On Wed, Feb 07, 2018 at 02:15:05PM -0800, Joshua Baker-LePain wrote: > On Wed, 7 Feb 2018 at 12:46am, William Hay wrote > > > IIRC resource quotas and reservations don't always play nicely together. > > The same error can come about for multiple different reasons so having

Re: [gridengine users] Scheduler getting stuck, "Skipping remaining N orders"

2018-02-07 Thread William Hay
On Tue, Feb 06, 2018 at 12:13:24PM -0800, Joshua Baker-LePain wrote: > I'm back again -- is it obvious that my new cluster just went into > production? Again, we're running SoGE 8.1.9 on a cluster with nodes of > several different sizes. We're running into an odd issue where SGE stops >

Re: [gridengine users] Minimum number of slots

2018-02-01 Thread William Hay
On Thu, Feb 01, 2018 at 11:44:25AM +0100, Ansgar Esztermann-Kirchner wrote: > Now, I think I can improve upon this choice by creating separate > queues for different machines "sizes", i.e. an 8-core queue, a > 20-core queue and so on. However, I do not see a (tractable) way to > enforce proper

Re: [gridengine users] gid_range values

2018-01-24 Thread William Hay
On Tue, Jan 23, 2018 at 06:22:28PM -0600, Calvin Dodge wrote: > The docs we've found say that gid_range must be greater than the > number of jobs expected to run currently on one host. > > Our recent experience suggests that it has to be greater than the > total number of jobs in the queue. If

Re: [gridengine users] Exporting environment variables using -V doesn't work from RHEL7 to RHEL6

2018-01-19 Thread William Hay
On Fri, Jan 05, 2018 at 08:02:18AM +, srinivas.chakrava...@wipro.com wrote: >Hi, > > > >We have recently upgraded one submit host from RHEL6.7 to RHEL7.2. > >Most of our grid execution servers are RHEL6.7, with a few RHEL6.3 >servers. When we run any jobs by using "-V"

[gridengine users] Happy new year GridEngine Users

2018-01-18 Thread William Hay
Testing if the users@gridengine.org mailing list works in 2018. signature.asc Description: PGP signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2018-01-18 Thread William Hay
On Fri, Dec 22, 2017 at 05:55:26PM -0500, berg...@merctech.com wrote: > True, but even with that info, there doesn't seem to be any universal > way to tell an arbitrary GPU job which GPU to use -- they all default > to device 0. With Nvidia GPUs we use a prolog script that manipulates lock files

Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

2018-01-18 Thread William Hay
On Mon, Jan 08, 2018 at 06:23:20PM -0500, berg...@merctech.com wrote: > Yeah, I've looked at that, but it brings up the 'accounting problem' > of changing the variable each time a GPU-enabled job begins or ends. Set it in starter_method (and sshd's force command if you want to support qlogin etc).

Re: [gridengine users] I'm getting an "Unable to initialize env" error; but our simultaneous ECs should be small

2017-11-23 Thread William Hay
On Wed, Nov 22, 2017 at 09:53:17AM -0800, Mun Johl wrote: >Hi, >Periodically I am seeing the following error: > > Unable to initialize environment because of error: cannot register event > client. Only 100 event clients are allowed in the system > >The error first showed up

Re: [gridengine users] Integration of GPUs into GE2011.11p1

2017-10-31 Thread William Hay
On Mon, Oct 30, 2017 at 09:56:37PM +0530, ANS wrote: >Hi, >Thank you for the detailed info. >But can let me know how can i submit a job using 4 GPUs, 8 cores from >2nodes consisting of 2 GPUs, 4 cores from each node. >Thanks, That's not something the free versions of grid

Re: [gridengine users] Integration of GPUs into GE2011.11p1

2017-10-30 Thread William Hay
On Wed, Oct 25, 2017 at 04:59:05PM +0200, Reuti wrote: > Hi, > > > Am 25.10.2017 um 16:06 schrieb ANS : > > > > Hi all, > > > > I am trying to integrate GPUs into my existing cluster with 2 GPUs per > > node. I have gone through few sites and done the following > > > >

Re: [gridengine users] running SOGE/execd on Cygwin

2017-10-20 Thread William Hay
On Thu, Oct 19, 2017 at 04:49:40PM -0700, Simon Matthews wrote: > Does anyone have any pointers on running execd on cygwin? > > inst_sge -x fails, because the 'uidgid' command doesn't seem to have been > built. > > I can set the environment variables myself and then start the program, > but the

Re: [gridengine users] Cygwin?

2017-10-16 Thread William Hay
; libcull.a(pack.o):pack.c:(.text+0x94d): undefined reference to `xdrmem_create' > libcull.a(pack.o):pack.c:(.text+0x959): undefined reference to `xdr_double' > collect2: error: ld returned 1 exit status > make: *** [../libs/sgeobj/Makefile:364: test_sge_object] Error 1 > not done > &g

Re: [gridengine users] Cygwin?

2017-10-13 Thread William Hay
On Thu, Oct 12, 2017 at 04:12:48PM -0700, System Administrator wrote: > I think it should be part of the ./configure step. If you exported it as an > env variable, then re-run the ./configure part. Or put it at the beginning > of the command, for example: > > CPPFLAGS=-I/usr/include/tirpc

Re: [gridengine users] Cygwin?

2017-10-10 Thread William Hay
On Mon, Oct 09, 2017 at 07:46:05PM -0700, Simon Matthews wrote: > Is it possible to build SOGE for Cygwin? > > SOGE says it is based on OGS which claimed that it supported Cygwin. > > I only need execd on Cygwin. Qmaster and the GUI tools need only run > under CentOS 6 and 7. > > Simon I don't

Re: [gridengine users] Max jobs per user

2017-10-10 Thread William Hay
On Sat, Sep 30, 2017 at 02:21:12AM +, John_Tai wrote: >Currently if I set a max job per user in the cluster, a new job will be >rejected if it exceeds the max. > > > >> qrsh > >job rejected: Only 100 jobs are allowed per user (current job count: 264) > > > >

Re: [gridengine users] load scaling

2017-09-04 Thread William Hay
On Fri, Sep 01, 2017 at 06:28:43PM +0900, Ueki Hikonuki wrote: > Hi, > > I tried to understand load scaling. But it is still unclear for me. > > Let's assume two hosts. > > hostA very fast machine > hostB regular speed machine > > Even though np_load_avg of hostA is much higher than hostB, > a

Re: [gridengine users] Fonts issue with RHEL 7.3

2017-07-26 Thread William Hay
On Tue, Jul 25, 2017 at 01:23:43AM +, Matt Hohmeister wrote: >When trying to run qmon on RHEL 7.3, I get this. Can someone share which >packages would take care of this? Hopefully one of these pages should sort it out. The last one should definitely do it but is fixing things user by

Re: [gridengine users] complex error

2017-07-26 Thread William Hay
On Tue, Jul 25, 2017 at 12:57:47AM +, John_Tai wrote: >I have configured virtual_free as a requestable resource: > > > >virtual_freememMEMORY <=YES JOB >00 > > > >And it's been working great for months. > > >

Re: [gridengine users] DISPLAY problem in RHEL6.8

2017-07-18 Thread William Hay
On Tue, Jul 18, 2017 at 09:23:05AM +, John_Tai wrote: >I'm having a DISPLAY issue in RHEL6.8 that I don't have in RHEL5. I am >using SGE6.2u6 > > > >I use VNC to connect to a linux server. By default the DISPLAY is set to >:4.0 and I can start GUI jobs locally: > >

Re: [gridengine users] New installation

2017-07-18 Thread William Hay
On Mon, Jul 17, 2017 at 08:00:31PM +, Matt Hohmeister wrote: > Thank you; this is a big help. :-) > > Along those lines, what do you all suggest for the shared directory? From > these instructions, it *appears* that the best choice is to share out NFS > from the master's /opt/sge/default,

Re: [gridengine users] Repeated error message in logs from RQS rules

2017-07-17 Thread William Hay
On Fri, Jul 14, 2017 at 08:36:06AM +, Simon Andrews wrote: >Can anyone shed any light on an error I'm getting repeated thousands of >times in my grid engine messages log. This happens when I have a job >which is submitted and which is stopped from running by an RQS rule I have >

Re: [gridengine users] New installation

2017-07-17 Thread William Hay
On Fri, Jul 14, 2017 at 08:58:59PM +, Matt Hohmeister wrote: >Hello- > > > >First off, please accept my apologies for this post, as I have _never_ >used gridengine before. I have two servers, both running RHEL 7.3, and >both linked to a shared xfs-formatted iSCSI volume

Re: [gridengine users] Ulimit for max open files

2017-06-28 Thread William Hay
slots job. > > Thanks! > Luis Assuming linux what does sysctl fs.file-max report? William > > On 6/27/17, 4:22 AM, "William Hay" <w@ucl.ac.uk> wrote: > > On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote: > >Hi, > > >

Re: [gridengine users] Ulimit for max open files

2017-06-27 Thread William Hay
On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote: >Hi, > > > >To increase the max open file, we have set execd_params in qconf -mconf >and also on the OS level: > >execd_params >H_DESCRIPTORS=262144,H_LOCKS=262144,H_MAXPROC=262144 > > >

Re: [gridengine users] taming qlogin

2017-06-26 Thread William Hay
On Fri, Jun 23, 2017 at 08:24:23AM -0700, Ilya wrote: > Hello, > > I am running 6.2u5 with ssh transport for qlogin (not tight integration) and > users are abusing this service: run jobs for days, abandon their sessions > that stay opened forever, etc. So I want to implement mandatory time limits

Re: [gridengine users] 6.2 Update 5 Patch 3 not available?

2017-06-13 Thread William Hay
On Mon, Jun 12, 2017 at 05:46:46PM -0400, Jeff Blaine wrote: > The Open Grid Scheduler homepage at > http://gridscheduler.sourceforge.net/ says: > > The current bugfix & LTS (Long Term Support) release is version > 6.2 update 5 patch 3 (SGE 6.2u5p3), which is based on Sun Grid >

Re: [gridengine users] Throttling job starts (thundering herd)

2017-03-24 Thread William Hay
On Thu, Feb 16, 2017 at 01:43:47PM -0500, Stuart Barkley wrote: > Is there a way to throttle job starts on Grid Engine (we are using Son > of Grid Engine)? Use a load sensor plus job_load_adjustment. Tweak jobs to request a low load (via sge_request or jsv) or set an alarm on all queues when the

Re: [gridengine users] John's cores pe (Was: users Digest...)

2017-03-23 Thread William Hay
On Thu, Mar 23, 2017 at 08:11:02AM +, John_Tai wrote: > Can I still download 6.2? Haven't been able to find it. > > John If you're going to upgrade you might as well go all the way to SoGE 8.1.9. William > > -Original Message- > From: Reuti [mailto:re...@staff.uni-marburg.de] >

Re: [gridengine users] Make qmaster buffer larger

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 05:20:37PM +0100, Jerome Poitout wrote: > Hello, > > OGS/GE 2011.11p1 > > I have an issue while submitting numerous jobs in a short time (over 300 > - not so much for me...) with -sync y option. It seems that qmaster > cannot handle all the requests and i get huge load on

Re: [gridengine users] qsub and reservation

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 07:29:25PM +0100, Roberto Nunnari wrote: > I don't mean move from node to node.. by moving I mean that something > happens in the scheduler.. that the scheduler reserves a slot for the > pending job requesting reservation.. in the schedule file, I see only lines > with the

Re: [gridengine users] qsub and reservation

2017-03-10 Thread William Hay
On Thu, Mar 09, 2017 at 02:24:38PM +0100, Roberto Nunnari wrote: > Hi Reuti. > Hi William. > > here's my settings you required: > paramsMONITOR=1 > max_reservation 32 > default_duration 0:10:0 > > I cannot understand how What I see

Re: [gridengine users] qsub and reservation

2017-03-09 Thread William Hay
On Wed, Mar 08, 2017 at 06:33:23PM +0100, Roberto Nunnari wrote: > Hello. > > I am using Oracle Grid Engine 6.2u7 and have some trouble understanding > reservation (qsub -R y ..). > > I'm trying to use this because of big jobs starving because of queues always > full of smaller jobs.. > >

Re: [gridengine users] limtation the number of submission job in queue waiting list

2017-02-23 Thread William Hay
On Thu, Feb 23, 2017 at 09:30:20AM +0900, Sangmin Park wrote: >Yes, it is. >I can handle the number of running jobs using resource quota policy. >However, the number of queue waiting jobs can't. >Basic rule is FIFO, so if one user submits hundre of jobs, another user >has to

Re: [gridengine users] making certain jobs or queues not count for tickets..

2017-02-15 Thread William Hay
On Wed, Feb 15, 2017 at 12:34:08AM +0200, Ben Daniel Pere wrote: >The suggestion sounds good but I'm not sure I understand step 1 - if it's >going to be assigned by project and not user - can I still have "powerful" >users in my normal default project that have more tickets there? >

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Tue, Feb 14, 2017 at 12:22:59PM +, Mark Dixon wrote: > On Tue, 14 Feb 2017, William Hay wrote: > ... > >We tweak the permissions on the device nodes from a privileged prolog but > >otherwise I suspect we're doing something similar. > > Hi William, > > Yea

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote: > Hi, > > I've been playing with allocating GPUs using gridengine and am wondering if > I'm trying to make it too complicated. > > We have some 24 core, 128G RAM machines, each with two K80 GPU cards in > them. I have a little

Re: [gridengine users] GE 6.2u5 Duplicate Job IDs

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:17:29PM -0500, Douglas Duckworth wrote: >Hello >About a month ago we recently started seeing duplicate job in SGE. >For example: >sysadmin@panda2[~]$ qacct -j 878815 >== >qname

Re: [gridengine users] Requesting GPUs on the qsub command line

2017-02-14 Thread William Hay
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote: > Hi, > > I've been playing with allocating GPUs using gridengine and am wondering if > I'm trying to make it too complicated. > > We have some 24 core, 128G RAM machines, each with two K80 GPU cards in > them. I have a little

Re: [gridengine users] SGE 6.1 binaries

2017-02-13 Thread William Hay
On Mon, Feb 13, 2017 at 10:20:51AM +0100, Julien Nicoulaud wrote: >Hi all, >I'm looking for Sun GridEngine 6.1 binaries (or sources) for some backward >compatibility testing, and I can't find it anywhere on the web: > * ge-6.1u5-common.tar.gz > * ge-6.1u5-bin-lx24-amd64.tar.gz

Re: [gridengine users] error qlogin_starter sent: 137 during qrsh

2016-10-13 Thread William Hay
On Thu, Oct 13, 2016 at 11:39:15AM +, Duje Drazin wrote: > Hi William, > > Sorry, I didn't catch your answer, how to "Check the nodes involved for a > firewall/packet filter" > Assuming this is a linux box then on a worker node of the cluster try running iptables -L to see if it has an

Re: [gridengine users] error qlogin_starter sent: 137 during qrsh

2016-10-13 Thread William Hay
On Thu, Oct 13, 2016 at 07:21:33AM +, Duje Drazin wrote: >Hi all, > > > >I have configured following: > > > >qlogin_command telnet > >qlogin_daemon/usr/sbin/in.telnetd > >rlogin_command /usr/bin/ssh -X > >

Re: [gridengine users] SoGE 8.1.8 - Very slow schedule time qw ==> r taking 30-60 sec.

2016-10-11 Thread William Hay
On Mon, Oct 10, 2016 at 02:39:21PM +, Yuri Burmachenko wrote: >We are using SoGE 8.1.8 and since recently approximately 2 months ago our >job schedule time raised up to 30-60 sec. > > >Any tips and advices where to look for the root cause and/or how can we >improve the

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-10 Thread William Hay
On Thu, Oct 06, 2016 at 12:47:49PM +0100, Mark Dixon wrote: > On Wed, 5 Oct 2016, William Hay wrote: > ... > >Our prolog and epilog (parallel) ssh into the slave nodes and do the > >equivalent of run-parts on directories full of scripts some of which check > >if they are

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-05 Thread William Hay
On Wed, Oct 05, 2016 at 12:31:52PM +0100, Mark Dixon wrote: > On Wed, 5 Oct 2016, William Hay wrote: > ... > >It was originally head node only so per job until a user requested local > >TMPDIR on each node so historical reasons. > ... > > Hi William, > > Wh

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-05 Thread William Hay
On Tue, Oct 04, 2016 at 04:51:42PM +0100, Mark Dixon wrote: > On Tue, 4 Oct 2016, William Hay wrote: > ... > >I have a per-job consumable and the TMPDIR filesystem is created on every > >node of the job. We have a (jsv enforced) policy that all multi-node jobs > >have excl

Re: [gridengine users] Control tmpdir usage on SGE

2016-10-04 Thread William Hay
On Tue, Oct 04, 2016 at 09:32:43AM +0100, Mark Dixon wrote: > It'd be interesting for people to share what they've done with parallel > jobs. Rightly or wrongly, I currently have a per-job consumable and the > $TMPDIR is only on the node with the MASTER task. I have a per-job consumable and the

Re: [gridengine users] access definition in grid

2016-09-26 Thread William Hay
On Tue, Sep 20, 2016 at 08:07:02AM +, sudha.penme...@wipro.com wrote: >Hi, > > > >Regarding access rules in grid, users primary UNIX group should be the one >which is defined in ACL to be able to access. > > > >Would it be possible to configure it such that user

Re: [gridengine users] Forcing Grid Engine jobs to error state with exit status other than 0, 99 or 100.

2016-09-15 Thread William Hay
On Wed, Sep 14, 2016 at 08:52:12PM +, Lee, Wayne wrote: > HI William, > > I've performed some tests by submitting a basic shell script which dumps the > environment (i.e. env) and performs either an "exit 0", "exit 99", "exit > 100", "exit 137" other exit status codes.If I set my script

Re: [gridengine users] Forcing Grid Engine jobs to error state with exit status other than 0, 99 or 100.

2016-09-14 Thread William Hay
On Tue, Sep 13, 2016 at 06:52:53PM +, Lee, Wayne wrote: >In the epilog script that I've setup for our jobs, I've attempted to >capture the value of the "exit_status" of a job or job task and if it >isn't 0, 99 or 100, exit the epilog script with an "exit 100". However >this

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-13 Thread William Hay
On Tue, Sep 13, 2016 at 03:15:19PM +1000, Derrick Lin wrote: >Thanks guys, >I am implementing the solution as outlined by William, except we are using >XFS here, so we are trying to do it by using XFS's project/directory >quota. Will do more testing and see how it goes.. >

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 01:29:52PM +0200, Reuti wrote: > > > Am 09.09.2016 um 12:52 schrieb William Hay <w@ucl.ac.uk>: > > Grid engine doesn't provide a mechanism to pass the resource requests to > > the prolog > > AFAIK so a mechanism to obtain the v

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 09:26:53AM +1000, Derrick Lin wrote: >Hi William, >Actually I don't quite get the need of: >2. Our JSV adds an environment variable to the job recording the amount >of disk requested (you could try parsing it out of the job spool but >this is easier). >

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-09 Thread William Hay
On Fri, Sep 09, 2016 at 10:37:13AM +0100, Mark Dixon wrote: > On Thu, 8 Sep 2016, William Hay wrote: > ... > >Remember tmpfs is not a ramdisk but the linux VFS layer without an attempt > >to provide real file system guarantees. It shouldn't be cached any more > >agressivel

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-08 Thread William Hay
On Thu, Sep 08, 2016 at 02:40:38PM +0100, Mark Dixon wrote: > On Thu, 8 Sep 2016, William Hay wrote: > ... > >At present we're using a huge swap partition and TMPFS instead of btrfs. > >You could probably do this with a volume manager and creating a > >regular filesys

Re: [gridengine users] Control tmpdir usage on SGE

2016-09-08 Thread William Hay
On Thu, Sep 08, 2016 at 10:10:51AM +1000, Derrick Lin wrote: >Hi all, >Each of our execution nodes has a scratch space mounted as /scratch_local. >I notice there is tmpdir variable can be changed in a queue's conf. >According to doc, SGE will create a per job dir on tmpdir, and set

Re: [gridengine users] sgemaster service fails to stay up after harddisk maxout incident

2016-08-30 Thread William Hay
On Fri, Aug 26, 2016 at 01:35:06PM +0100, Ram??n Fallon wrote: >Thanks for the reply, William. > >Yes, that's true. > >It's a pity there's not a way to reset gridengine to begin anew on a new >database for example. > >It seems quite a radical step to have to re-install

Re: [gridengine users] sgemaster service fails to stay up after harddisk maxout incident

2016-08-26 Thread William Hay
On Thu, Aug 25, 2016 at 04:40:55PM +0100, Ram??n Fallon wrote: >* sgemaster still fails to come up. "messages" in >$SGE_ROOT/$SGE_CELL/spool/qmaster now says: >main|frontend0|W|local configuration frontend0 not defined - using global >configuration >main|frontend0|E|global

Re: [gridengine users] firewall on submit host

2016-08-25 Thread William Hay
On Thu, Aug 25, 2016 at 09:15:26AM +0100, William Hay wrote: > On Wed, Aug 24, 2016 at 09:07:44PM +0200, Alexander Hasselhuhn wrote: > > Dear Reuti, > > > > thanks for the reply, indeed at the moment there is a login node, but we > > have plans to remove it (by set

Re: [gridengine users] firewall on submit host

2016-08-25 Thread William Hay
On Wed, Aug 24, 2016 at 09:07:44PM +0200, Alexander Hasselhuhn wrote: > Dear Reuti, > > thanks for the reply, indeed at the moment there is a login node, but we have > plans to remove it (by setting up a route through our gateway, which makes > some administrative tasks more smooth) and

Re: [gridengine users] "Decoding gridengine" workshop

2016-08-24 Thread William Hay
On Wed, Aug 24, 2016 at 10:20:06AM +0100, Mark Dixon wrote: > Hi there, > > Is there any interest for a meeting in the UK looking at the internals of > gridengine? Potential topics might be: > > * Building from source > * How the code is organised > * How to debug or develop gridengine > > The

Re: [gridengine users] reporting doesn't log which host receives a task

2016-08-22 Thread William Hay
On Fri, Aug 19, 2016 at 03:59:34PM +0100, Lars van der Bijl wrote: >Hey William, >is the schedule log different from the reporting file? i've had a look >through the common and spool directory but can't find mention of it. You have to enable it with MONITOR=1 in the scheduler params.

  1   2   3   4   5   >