On Mon, Aug 24, 2020 at 08:50:51PM +, Mun Johl wrote:
>Hi all,
>
>
>
>We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux v6.8
>.
>
>
>
>This anomaly isn’t a showstopper by any means, but it has happened enough
>that I decided to reach out
On Thu, Aug 13, 2020 at 07:29:32PM +, Derek Stephenson wrote:
>HI,
>
>
>
>We’re running SoGE 8.1.9 and we’re running into an issue with preemptive
>queueing I’m curious if others have had to ever address. We have a
>regression queue that is pre-empted by a daily use
On Mon, May 11, 2020 at 09:39:26PM +, Mun Johl wrote:
> Hi William,
>
> Thank you for your reply.
> See my comments below.
>
> > -Original Message-
> > On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote:
> > > Hi William, et al.,
> > > I am not explicitly setting the
On Mon, May 11, 2020 at 09:30:14PM +, Mun Johl wrote:
> Hi William, et al.,
> [Mun] Thanks for the tip; I'm still trying to get back to where I can launch
> qsrh again. Even after I put the requisite /etc/pam.d/sshd line at the head
> of the file I'm still getting the "Your "qrsh" request
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote:
> Hi William, et al.,
> I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh
> -X'. Nor have I done anything to open any additional ports. Again, since
> 'ssh -X' is working for us. As a reminder, there is no way
On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote:
> Hi William, et al.,
>
> Thank you kindly for your response and insight.
> Please see my comments below.
>
> > On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote:
> > > [Mun] In order to use ssh -X for our jobs that require an X11
On Wed, May 06, 2020 at 11:10:40PM +, Mun Johl wrote:
> [Mun] In order to use ssh -X for our jobs that require an X11 window to be
> pushed to a user's VNC display, I am planning on the following changes. But
> please let me know if I have missed something (or everything).
>
> 1. Update
On Mon, May 04, 2020 at 09:06:46AM -0400, Korzennik, Sylvain wrote:
>We have no problem having jobs w/ X11 enabled, BUT users must use qlogin,
>not qsub or qrsh (the way we have configured it).
>We have switched from SGE to UGE, but I'm sure the 'issue' is the same,
>you need to
On Fri, May 01, 2020 at 06:44:08PM +, Mun Johl wrote:
>Hi,
>
>
>
>I am using SGE on RHEL6. I am trying to launch a qsub job (a TCL script)
>via grid that will result in a GUI application being opened on the
>caller’s display (which is a VNC session).
Using qsub for this
On Fri, Apr 03, 2020 at 02:54:19AM +, Shiel, Adam wrote:
> I finally had a chance to experiment with this some.
>
> I think one basic problem was that I had bash as a login shell. Removing bash
> from the login shell and specifying "qsub -S /bin/bash " passed my local
> PATH to the
On Tue, Oct 16, 2018 at 06:53:11PM -0500, Jerome wrote:
> Dear William
>
> I'm watching this trac system, and it seem's to be reserved for
> developper only.. That's seems that to report a bug, one need to follow
> some specifications, which i don't really know... WHere can i read about
> this?
On Fri, Oct 12, 2018 at 02:13:32PM -0400, Daniel Povey wrote:
>There is an issue tracker here
>https://arc.liv.ac.uk/trac
>but it's not clear whether Dave Love still has access to it (he moved to
The issue tracker has it's own login system. I still have access to it and
I've never
On Fri, Aug 31, 2018 at 10:27:39AM +, Marshall2, John (SSC/SPC) wrote:
>Hi,
>When gridengine calculates cpu usage (based on wallclock) it uses:
>cpu usage = wallclock * nslots
>This does not account for the number of cpus that may be used for
>each slot, which is
On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote:
>HI Reuti,
>The prolog script is set to run by root indeed. The xfs quota requires
>root privilege.
>I also tried the 2nd approach but it seems that the addgrpid file has not
>been created when the prolog script
On Wed, Jul 11, 2018 at 09:21:10AM -0400, Douglas Duckworth wrote:
>Hi
>We are running GE 6.2u5 and moving to Slurm. Though before we do some
>changes need to be made within GE.
>For example we have 66 user sets within our share tree. However none of
>them were configured as
On Fri, May 18, 2018 at 05:42:42PM +0200, Reuti wrote:
> Note: to read old accouting files in `qacct` on-the-fly you can use:
>
> $ qacct -o reuti -f <(zcat /usr/sge/default/common/accounting.1.gz)
>
If you specify the -f flag twice the later one takes precedence.
You can therefore easily create
On Wed, May 02, 2018 at 07:24:39PM -0700, Simon Matthews wrote:
> That solution requires working "db_dump" and "db_restore" executables,
> which don't appear to be available for the SoGE version.
db_dump etc are part of Berkeley DB. You need to match the version against
which
SoGE is built I
04/26/2018 11:58:01| main|node-s03a-003|E|shepherd of job 5083806.1 exited
with exit status = 28
We had a shepherd exit with the above error code after about 12 hours. As a
result it appears not to have run its
epilog. This appears to be ENOSPC however we can't see sign of filesystems
On Mon, Apr 16, 2018 at 04:52:33PM +0100, Mark Dixon wrote:
> > share-tree only as a tie breaker. But deleting jobs would be bad. Is
> > the probably lose any jobs queued something you know from experience? It
> > seems odd that we can have jobs queued and running with the running
> > qmaster
bs queued
and running
with the running qmaster knowing nothing of the user but deleting the file
would kill them on
restart.
>
> Mark
>
> On Mon, 16 Apr 2018, William Hay wrote:
>
> > We had a user report that one of their array jobs wasn't scheduling A
> > b
We had a user report that one of their array jobs wasn't scheduling A
bit of poking around showed that qconf -suser knew nothing of the user
despite them having a queued job. However there was a file in the spool
that should have defined the user. Several other users appear to be
affected as
On Fri, Apr 13, 2018 at 01:54:14PM +0200, leconte j??r??me wrote:
> Hello,
> I'm using SGE 8.1.9 under debian Stretch
>
> ?? I have a strange problem.
>
> ?? when I use qstat , sometime the stats displayed are wrong. Then, I
> believe that gridengine doesn't work properly.
>
> I explain
On Thu, Apr 12, 2018 at 04:40:03PM -0400, berg...@merctech.com wrote:
> We're using SoGE 8.1.6 in an environment where users may login to the
> cluster from a Linux workstation (typically using a lower-case login
> name) or a Windows desktop, where their login name (as supplied by the
> enterprise
On Thu, Apr 05, 2018 at 03:38:18PM +0200, Paul Paul wrote:
> William,
>
> Thanks for your reply.
>
> In the 'messages' file of the exec host, there is nothing (the last message
> was 2 weeks ago).
Might be worth increasing the loglevel to get more info about what is going on
there.
William
On Thu, Apr 05, 2018 at 09:46:23AM +0200, Paul Paul wrote:
> Hello,
>
> We're using SGE 8.1.9 and randomly, we have jobs that finish with success
> (our jobs logs confirm this) but the master is not notified.
> On the compute, all the folders related to such a job are still here,
> correctly
On Wed, Mar 28, 2018 at 01:52:59PM +0200, Sms Backup wrote:
>Thanks for reply !
>You are rigght, this is systemd unit file. So for filtering I just
>use ExecStart=/bin/sh -c /opt/sge/bin/sge_qmaster | -v '^RUE_' ?
>Sorry, but I cannot understand this part.
I think you probably
On Fri, Mar 23, 2018 at 12:07:39PM +0100, Sms Backup wrote:
>Thanks for your replies,
>So in total it would be something like this: ExecStart=/bin/sh -c
>/opt/sge/bin/sge_qmaster | -v '^RUE_' >&/dev/null ?
No. The grep is intended to replace of the redirection to /dev/null so
as to
On Fri, Mar 23, 2018 at 09:36:29AM +, Mark Dixon wrote:
> Hi Jakub,
>
> That's right: if you need to cut down the logging, one option is to add the
> redirection in the start script.
>
> You're looking for the line starting "sge_qmaster", and you might want to
> try adding a ">/dev/null"
On Fri, Mar 23, 2018 at 12:27:48AM +0100, Reuti wrote:
> Hi,
>
> Am 22.03.2018 um 20:51 schrieb Mun Johl:
>
> > Hi,
> >
> > I?m using SGE v8.1.9 on RHEL6.8 . In my script that I submit via qsub
> > (let?s call it scriptA), I have a gxmessage (gxmessage is similar to
> > xmessage, postnote,
On Thu, Mar 22, 2018 at 04:29:27PM +0100, leconte j??r??me wrote:
> Thank you,
>
> ?? But I'm not sure to know what I look for.
>
> ?? If I correctly understand
>
> ??I must see qrsh or qlogin when I type
>
> ompi_info
>
> and if not I must recompile grid_engine with that option
>
> Best
On Wed, Mar 21, 2018 at 11:55:14AM -0300, Dimar Jaime Gonz??lez Soto wrote:
>Hi, I need to know how can I execute grid engine in specific hosts. I
>tried the follow execution line:
>qsub -v NR_PROCESSES=60 -l
>h='ubuntu-node2|ubuntu-node11|ubuntu-node12|ubuntu-node13' -b y -j y
On Wed, Mar 21, 2018 at 03:23:01PM +, srinivas.chakrava...@wipro.com wrote:
> Hi,
>
> The version in our environment is 2011.11.
Looking at https://arc.liv.ac.uk/repos/darcs/sge-release/NEWS
it looks like it was fixed in SoGE 8.1.4 which was released in 2013 a couple
of years after the
On Wed, Mar 21, 2018 at 07:59:41AM +0100, Sms Backup wrote:
>William,
>Thanks for reply. Unfortunately I have few non-interactive queues, so I
>cannot limit slots this way.
>99% of messages printed to system log look like this below, so I believe
>that are the messages which
On Mon, Mar 19, 2018 at 12:11:04PM +, srinivas.chakrava...@wipro.com wrote:
>Hi,
>
>
>
>We have some functions in our environment which are not being parsed
>properly by sge, which is causing errors on the stdout while launching
>interactive jobs
>
>
>
>But
On Thu, Mar 15, 2018 at 10:19:29PM +, Mun Johl wrote:
>Hi,
>
>
>
>I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6
>system. The yum command exits with the following error:
>
>
>
>Error: Package: gridengine-8.1.9-1.el6.x86_64
>
On Tue, Feb 27, 2018 at 10:46:57AM +0100, Reuti wrote:
> Hi Nils:
>
> > Am 27.02.2018 um 10:11 schrieb Nils Giordano :
> > however you were right: `ssh` is definitively used to access nodes
> > (probably on purpose since we have access to several GUI apps). Your
> > answer
On Thu, Feb 15, 2018 at 11:28:58AM -0600, Calvin Dodge wrote:
> While the help we received from this and other gridengine lists helped
> us resolve the issue of jobs being mysteriously killed, we've been
> asked to look into converting the customer's SGE cluster, using
> cgroups for job
On Thu, Feb 08, 2018 at 03:42:03PM -0800, Joshua Baker-LePain wrote:
> 153758 0.51149 tomography USER1 qw02/08/2018 14:03:05
> 192
> 153759 0.0 qss_svk_ge USER2 qw02/08/2018 14:15:06
> 1 1
> 153760
On Wed, Feb 07, 2018 at 02:15:05PM -0800, Joshua Baker-LePain wrote:
> On Wed, 7 Feb 2018 at 12:46am, William Hay wrote
>
> > IIRC resource quotas and reservations don't always play nicely together.
> > The same error can come about for multiple different reasons so having
On Tue, Feb 06, 2018 at 12:13:24PM -0800, Joshua Baker-LePain wrote:
> I'm back again -- is it obvious that my new cluster just went into
> production? Again, we're running SoGE 8.1.9 on a cluster with nodes of
> several different sizes. We're running into an odd issue where SGE stops
>
On Thu, Feb 01, 2018 at 11:44:25AM +0100, Ansgar Esztermann-Kirchner wrote:
> Now, I think I can improve upon this choice by creating separate
> queues for different machines "sizes", i.e. an 8-core queue, a
> 20-core queue and so on. However, I do not see a (tractable) way to
> enforce proper
On Tue, Jan 23, 2018 at 06:22:28PM -0600, Calvin Dodge wrote:
> The docs we've found say that gid_range must be greater than the
> number of jobs expected to run currently on one host.
>
> Our recent experience suggests that it has to be greater than the
> total number of jobs in the queue. If
On Fri, Jan 05, 2018 at 08:02:18AM +, srinivas.chakrava...@wipro.com wrote:
>Hi,
>
>
>
>We have recently upgraded one submit host from RHEL6.7 to RHEL7.2.
>
>Most of our grid execution servers are RHEL6.7, with a few RHEL6.3
>servers. When we run any jobs by using "-V"
Testing if the users@gridengine.org mailing list works in 2018.
signature.asc
Description: PGP signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
On Fri, Dec 22, 2017 at 05:55:26PM -0500, berg...@merctech.com wrote:
> True, but even with that info, there doesn't seem to be any universal
> way to tell an arbitrary GPU job which GPU to use -- they all default
> to device 0.
With Nvidia GPUs we use a prolog script that manipulates lock files
On Mon, Jan 08, 2018 at 06:23:20PM -0500, berg...@merctech.com wrote:
> Yeah, I've looked at that, but it brings up the 'accounting problem'
> of changing the variable each time a GPU-enabled job begins or ends.
Set it in starter_method (and sshd's force command if you want to support
qlogin etc).
On Wed, Nov 22, 2017 at 09:53:17AM -0800, Mun Johl wrote:
>Hi,
>Periodically I am seeing the following error:
>
> Unable to initialize environment because of error: cannot register event
> client. Only 100 event clients are allowed in the system
>
>The error first showed up
On Mon, Oct 30, 2017 at 09:56:37PM +0530, ANS wrote:
>Hi,
>Thank you for the detailed info.
>But can let me know how can i submit a job using 4 GPUs, 8 cores from
>2nodes consisting of 2 GPUs, 4 cores from each node.
>Thanks,
That's not something the free versions of grid
On Wed, Oct 25, 2017 at 04:59:05PM +0200, Reuti wrote:
> Hi,
>
> > Am 25.10.2017 um 16:06 schrieb ANS :
> >
> > Hi all,
> >
> > I am trying to integrate GPUs into my existing cluster with 2 GPUs per
> > node. I have gone through few sites and done the following
> >
> >
On Thu, Oct 19, 2017 at 04:49:40PM -0700, Simon Matthews wrote:
> Does anyone have any pointers on running execd on cygwin?
>
> inst_sge -x fails, because the 'uidgid' command doesn't seem to have been
> built.
>
> I can set the environment variables myself and then start the program,
> but the
; libcull.a(pack.o):pack.c:(.text+0x94d): undefined reference to `xdrmem_create'
> libcull.a(pack.o):pack.c:(.text+0x959): undefined reference to `xdr_double'
> collect2: error: ld returned 1 exit status
> make: *** [../libs/sgeobj/Makefile:364: test_sge_object] Error 1
> not done
>
&g
On Thu, Oct 12, 2017 at 04:12:48PM -0700, System Administrator wrote:
> I think it should be part of the ./configure step. If you exported it as an
> env variable, then re-run the ./configure part. Or put it at the beginning
> of the command, for example:
>
> CPPFLAGS=-I/usr/include/tirpc
On Mon, Oct 09, 2017 at 07:46:05PM -0700, Simon Matthews wrote:
> Is it possible to build SOGE for Cygwin?
>
> SOGE says it is based on OGS which claimed that it supported Cygwin.
>
> I only need execd on Cygwin. Qmaster and the GUI tools need only run
> under CentOS 6 and 7.
>
> Simon
I don't
On Sat, Sep 30, 2017 at 02:21:12AM +, John_Tai wrote:
>Currently if I set a max job per user in the cluster, a new job will be
>rejected if it exceeds the max.
>
>
>
>> qrsh
>
>job rejected: Only 100 jobs are allowed per user (current job count: 264)
>
>
>
>
On Fri, Sep 01, 2017 at 06:28:43PM +0900, Ueki Hikonuki wrote:
> Hi,
>
> I tried to understand load scaling. But it is still unclear for me.
>
> Let's assume two hosts.
>
> hostA very fast machine
> hostB regular speed machine
>
> Even though np_load_avg of hostA is much higher than hostB,
> a
On Tue, Jul 25, 2017 at 01:23:43AM +, Matt Hohmeister wrote:
>When trying to run qmon on RHEL 7.3, I get this. Can someone share which
>packages would take care of this?
Hopefully one of these pages should sort it out. The last one should
definitely do it but is fixing things
user by
On Tue, Jul 25, 2017 at 12:57:47AM +, John_Tai wrote:
>I have configured virtual_free as a requestable resource:
>
>
>
>virtual_freememMEMORY <=YES JOB
>00
>
>
>
>And it's been working great for months.
>
>
>
On Tue, Jul 18, 2017 at 09:23:05AM +, John_Tai wrote:
>I'm having a DISPLAY issue in RHEL6.8 that I don't have in RHEL5. I am
>using SGE6.2u6
>
>
>
>I use VNC to connect to a linux server. By default the DISPLAY is set to
>:4.0 and I can start GUI jobs locally:
>
>
On Mon, Jul 17, 2017 at 08:00:31PM +, Matt Hohmeister wrote:
> Thank you; this is a big help. :-)
>
> Along those lines, what do you all suggest for the shared directory? From
> these instructions, it *appears* that the best choice is to share out NFS
> from the master's /opt/sge/default,
On Fri, Jul 14, 2017 at 08:36:06AM +, Simon Andrews wrote:
>Can anyone shed any light on an error I'm getting repeated thousands of
>times in my grid engine messages log. This happens when I have a job
>which is submitted and which is stopped from running by an RQS rule I have
>
On Fri, Jul 14, 2017 at 08:58:59PM +, Matt Hohmeister wrote:
>Hello-
>
>
>
>First off, please accept my apologies for this post, as I have _never_
>used gridengine before. I have two servers, both running RHEL 7.3, and
>both linked to a shared xfs-formatted iSCSI volume
slots job.
>
> Thanks!
> Luis
Assuming linux what does sysctl fs.file-max report?
William
>
> On 6/27/17, 4:22 AM, "William Hay" <w@ucl.ac.uk> wrote:
>
> On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote:
> >Hi,
> >
>
On Mon, Jun 26, 2017 at 05:24:57PM +, Luis Huang wrote:
>Hi,
>
>
>
>To increase the max open file, we have set execd_params in qconf -mconf
>and also on the OS level:
>
>execd_params
>H_DESCRIPTORS=262144,H_LOCKS=262144,H_MAXPROC=262144
>
>
>
On Fri, Jun 23, 2017 at 08:24:23AM -0700, Ilya wrote:
> Hello,
>
> I am running 6.2u5 with ssh transport for qlogin (not tight integration) and
> users are abusing this service: run jobs for days, abandon their sessions
> that stay opened forever, etc. So I want to implement mandatory time limits
On Mon, Jun 12, 2017 at 05:46:46PM -0400, Jeff Blaine wrote:
> The Open Grid Scheduler homepage at
> http://gridscheduler.sourceforge.net/ says:
>
> The current bugfix & LTS (Long Term Support) release is version
> 6.2 update 5 patch 3 (SGE 6.2u5p3), which is based on Sun Grid
>
On Thu, Feb 16, 2017 at 01:43:47PM -0500, Stuart Barkley wrote:
> Is there a way to throttle job starts on Grid Engine (we are using Son
> of Grid Engine)?
Use a load sensor plus job_load_adjustment. Tweak jobs to request
a low load (via sge_request or jsv) or set an alarm on all queues
when the
On Thu, Mar 23, 2017 at 08:11:02AM +, John_Tai wrote:
> Can I still download 6.2? Haven't been able to find it.
>
> John
If you're going to upgrade you might as well go all the way to SoGE 8.1.9.
William
>
> -Original Message-
> From: Reuti [mailto:re...@staff.uni-marburg.de]
>
On Thu, Mar 09, 2017 at 05:20:37PM +0100, Jerome Poitout wrote:
> Hello,
>
> OGS/GE 2011.11p1
>
> I have an issue while submitting numerous jobs in a short time (over 300
> - not so much for me...) with -sync y option. It seems that qmaster
> cannot handle all the requests and i get huge load on
On Thu, Mar 09, 2017 at 07:29:25PM +0100, Roberto Nunnari wrote:
> I don't mean move from node to node.. by moving I mean that something
> happens in the scheduler.. that the scheduler reserves a slot for the
> pending job requesting reservation.. in the schedule file, I see only lines
> with the
On Thu, Mar 09, 2017 at 02:24:38PM +0100, Roberto Nunnari wrote:
> Hi Reuti.
> Hi William.
>
> here's my settings you required:
> paramsMONITOR=1
> max_reservation 32
> default_duration 0:10:0
>
> I cannot understand how What I see
On Wed, Mar 08, 2017 at 06:33:23PM +0100, Roberto Nunnari wrote:
> Hello.
>
> I am using Oracle Grid Engine 6.2u7 and have some trouble understanding
> reservation (qsub -R y ..).
>
> I'm trying to use this because of big jobs starving because of queues always
> full of smaller jobs..
>
>
On Thu, Feb 23, 2017 at 09:30:20AM +0900, Sangmin Park wrote:
>Yes, it is.
>I can handle the number of running jobs using resource quota policy.
>However, the number of queue waiting jobs can't.
>Basic rule is FIFO, so if one user submits hundre of jobs, another user
>has to
On Wed, Feb 15, 2017 at 12:34:08AM +0200, Ben Daniel Pere wrote:
>The suggestion sounds good but I'm not sure I understand step 1 - if it's
>going to be assigned by project and not user - can I still have "powerful"
>users in my normal default project that have more tickets there?
>
On Tue, Feb 14, 2017 at 12:22:59PM +, Mark Dixon wrote:
> On Tue, 14 Feb 2017, William Hay wrote:
> ...
> >We tweak the permissions on the device nodes from a privileged prolog but
> >otherwise I suspect we're doing something similar.
>
> Hi William,
>
> Yea
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote:
> Hi,
>
> I've been playing with allocating GPUs using gridengine and am wondering if
> I'm trying to make it too complicated.
>
> We have some 24 core, 128G RAM machines, each with two K80 GPU cards in
> them. I have a little
On Mon, Feb 13, 2017 at 03:17:29PM -0500, Douglas Duckworth wrote:
>Hello
>About a month ago we recently started seeing duplicate job in SGE.
>For example:
>sysadmin@panda2[~]$ qacct -j 878815
>==
>qname
On Mon, Feb 13, 2017 at 03:52:20PM +, Mark Dixon wrote:
> Hi,
>
> I've been playing with allocating GPUs using gridengine and am wondering if
> I'm trying to make it too complicated.
>
> We have some 24 core, 128G RAM machines, each with two K80 GPU cards in
> them. I have a little
On Mon, Feb 13, 2017 at 10:20:51AM +0100, Julien Nicoulaud wrote:
>Hi all,
>I'm looking for Sun GridEngine 6.1 binaries (or sources) for some backward
>compatibility testing, and I can't find it anywhere on the web:
> * ge-6.1u5-common.tar.gz
> * ge-6.1u5-bin-lx24-amd64.tar.gz
On Thu, Oct 13, 2016 at 11:39:15AM +, Duje Drazin wrote:
> Hi William,
>
> Sorry, I didn't catch your answer, how to "Check the nodes involved for a
> firewall/packet filter"
>
Assuming this is a linux box then on a worker node of the cluster try
running iptables -L to see if it has an
On Thu, Oct 13, 2016 at 07:21:33AM +, Duje Drazin wrote:
>Hi all,
>
>
>
>I have configured following:
>
>
>
>qlogin_command telnet
>
>qlogin_daemon/usr/sbin/in.telnetd
>
>rlogin_command /usr/bin/ssh -X
>
>
On Mon, Oct 10, 2016 at 02:39:21PM +, Yuri Burmachenko wrote:
>We are using SoGE 8.1.8 and since recently approximately 2 months ago our
>job schedule time raised up to 30-60 sec.
>
>
>Any tips and advices where to look for the root cause and/or how can we
>improve the
On Thu, Oct 06, 2016 at 12:47:49PM +0100, Mark Dixon wrote:
> On Wed, 5 Oct 2016, William Hay wrote:
> ...
> >Our prolog and epilog (parallel) ssh into the slave nodes and do the
> >equivalent of run-parts on directories full of scripts some of which check
> >if they are
On Wed, Oct 05, 2016 at 12:31:52PM +0100, Mark Dixon wrote:
> On Wed, 5 Oct 2016, William Hay wrote:
> ...
> >It was originally head node only so per job until a user requested local
> >TMPDIR on each node so historical reasons.
> ...
>
> Hi William,
>
> Wh
On Tue, Oct 04, 2016 at 04:51:42PM +0100, Mark Dixon wrote:
> On Tue, 4 Oct 2016, William Hay wrote:
> ...
> >I have a per-job consumable and the TMPDIR filesystem is created on every
> >node of the job. We have a (jsv enforced) policy that all multi-node jobs
> >have excl
On Tue, Oct 04, 2016 at 09:32:43AM +0100, Mark Dixon wrote:
> It'd be interesting for people to share what they've done with parallel
> jobs. Rightly or wrongly, I currently have a per-job consumable and the
> $TMPDIR is only on the node with the MASTER task.
I have a per-job consumable and the
On Tue, Sep 20, 2016 at 08:07:02AM +, sudha.penme...@wipro.com wrote:
>Hi,
>
>
>
>Regarding access rules in grid, users primary UNIX group should be the one
>which is defined in ACL to be able to access.
>
>
>
>Would it be possible to configure it such that user
On Wed, Sep 14, 2016 at 08:52:12PM +, Lee, Wayne wrote:
> HI William,
>
> I've performed some tests by submitting a basic shell script which dumps the
> environment (i.e. env) and performs either an "exit 0", "exit 99", "exit
> 100", "exit 137" other exit status codes.If I set my script
On Tue, Sep 13, 2016 at 06:52:53PM +, Lee, Wayne wrote:
>In the epilog script that I've setup for our jobs, I've attempted to
>capture the value of the "exit_status" of a job or job task and if it
>isn't 0, 99 or 100, exit the epilog script with an "exit 100". However
>this
On Tue, Sep 13, 2016 at 03:15:19PM +1000, Derrick Lin wrote:
>Thanks guys,
>I am implementing the solution as outlined by William, except we are using
>XFS here, so we are trying to do it by using XFS's project/directory
>quota. Will do more testing and see how it goes..
>
On Fri, Sep 09, 2016 at 01:29:52PM +0200, Reuti wrote:
>
> > Am 09.09.2016 um 12:52 schrieb William Hay <w@ucl.ac.uk>:
> > Grid engine doesn't provide a mechanism to pass the resource requests to
> > the prolog
> > AFAIK so a mechanism to obtain the v
On Fri, Sep 09, 2016 at 09:26:53AM +1000, Derrick Lin wrote:
>Hi William,
>Actually I don't quite get the need of:
>2. Our JSV adds an environment variable to the job recording the amount
>of disk requested (you could try parsing it out of the job spool but
>this is easier).
>
On Fri, Sep 09, 2016 at 10:37:13AM +0100, Mark Dixon wrote:
> On Thu, 8 Sep 2016, William Hay wrote:
> ...
> >Remember tmpfs is not a ramdisk but the linux VFS layer without an attempt
> >to provide real file system guarantees. It shouldn't be cached any more
> >agressivel
On Thu, Sep 08, 2016 at 02:40:38PM +0100, Mark Dixon wrote:
> On Thu, 8 Sep 2016, William Hay wrote:
> ...
> >At present we're using a huge swap partition and TMPFS instead of btrfs.
> >You could probably do this with a volume manager and creating a
> >regular filesys
On Thu, Sep 08, 2016 at 10:10:51AM +1000, Derrick Lin wrote:
>Hi all,
>Each of our execution nodes has a scratch space mounted as /scratch_local.
>I notice there is tmpdir variable can be changed in a queue's conf.
>According to doc, SGE will create a per job dir on tmpdir, and set
On Fri, Aug 26, 2016 at 01:35:06PM +0100, Ram??n Fallon wrote:
>Thanks for the reply, William.
>
>Yes, that's true.
>
>It's a pity there's not a way to reset gridengine to begin anew on a new
>database for example.
>
>It seems quite a radical step to have to re-install
On Thu, Aug 25, 2016 at 04:40:55PM +0100, Ram??n Fallon wrote:
>* sgemaster still fails to come up. "messages" in
>$SGE_ROOT/$SGE_CELL/spool/qmaster now says:
>main|frontend0|W|local configuration frontend0 not defined - using global
>configuration
>main|frontend0|E|global
On Thu, Aug 25, 2016 at 09:15:26AM +0100, William Hay wrote:
> On Wed, Aug 24, 2016 at 09:07:44PM +0200, Alexander Hasselhuhn wrote:
> > Dear Reuti,
> >
> > thanks for the reply, indeed at the moment there is a login node, but we
> > have plans to remove it (by set
On Wed, Aug 24, 2016 at 09:07:44PM +0200, Alexander Hasselhuhn wrote:
> Dear Reuti,
>
> thanks for the reply, indeed at the moment there is a login node, but we have
> plans to remove it (by setting up a route through our gateway, which makes
> some administrative tasks more smooth) and
On Wed, Aug 24, 2016 at 10:20:06AM +0100, Mark Dixon wrote:
> Hi there,
>
> Is there any interest for a meeting in the UK looking at the internals of
> gridengine? Potential topics might be:
>
> * Building from source
> * How the code is organised
> * How to debug or develop gridengine
>
> The
On Fri, Aug 19, 2016 at 03:59:34PM +0100, Lars van der Bijl wrote:
>Hey William,
>is the schedule log different from the reporting file? i've had a look
>through the common and spool directory but can't find mention of it.
You have to enable it with MONITOR=1 in the scheduler params.
1 - 100 of 429 matches
Mail list logo