Re: [gridengine users] Reported memory usage too high

2016-05-09 Thread Alex Chekholko
s from overbooking a node. Thank you, Nico ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] sgemaster crash

2016-02-05 Thread Alex Chekholko
the built in debugging (man sge_dl). William ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860 ___ users mail

Re: [gridengine users] qmaster worker error message

2015-12-01 Thread Alex Chekholko
OK, we figured it out, the user has an orphaned tmux session which has a shell in it with the command watch qdel * On 11/30/2015 04:27 PM, Alex Chekholko wrote: Hi, My qmaster messages log is continuously printing, every 2s: 11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s

[gridengine users] qmaster worker error message

2015-11-30 Thread Alex Chekholko
Hi, My qmaster messages log is continuously printing, every 2s: 11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s) yangili does not exist 11/30/2015 16:21:30|worker|scg3-hn01|E|The job * of user(s) yangili does not exist 11/30/2015 16:21:32|worker|scg3-hn01|E|The job * of user(s) yang

Re: [gridengine users] Monitoring slot usage

2015-08-06 Thread Alex Chekholko
ents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk <http://www.babraham.ac.uk/terms> ___ users mail

Re: [gridengine users] SoGE 8.1.8 - Only 950 event clients are allowed in the system (try02)

2015-08-06 Thread Alex Chekholko
Hi, I see in our config (qconf -sconf) qmaster_params MAX_DYN_EC=500 gdi_timeout=120 gdi_retries=4 Have you tried fiddling with the MAX_DYN_EC parameter? Regards, Alex On 06/12/2015 02:44 PM, Grau, Federico (NIH/NHGRI) [C] wrote: Hello gridengine users, Our organization has be

Re: [gridengine users] error: failed receiving gdi request

2015-06-22 Thread Alex Chekholko
attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com -- Alex Chekholko ch

Re: [gridengine users] command runs in grid engine but does not complete.

2015-06-08 Thread Alex Chekholko
og any thoughts? Dan ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-08 Thread Alex Chekholko
On 06/08/2015 12:49 AM, William Hay wrote: On Fri, 5 Jun 2015 22:16:12 + Alex Chekholko wrote: Hi all, I have a standard grid engine cluster (sge-8.1.8 tarball from Dave Love's site) where users use qlogin to get interactive shells on compute nodes, and we use a qlogin wrapper scri

Re: [gridengine users] Negative complex values

2015-06-08 Thread Alex Chekholko
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms> ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

[gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-05 Thread Alex Chekholko
ite understand what https://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html is for, no matter how many times I re-read those man pages. Do I need both pam_sge-qrsh-setup and pam_sge_authorize? Regards, -- Alex Chekholko ch...@stanford.edu ___

Re: [gridengine users] qlogin, interactive sessions, and cgroups

2015-06-05 Thread Alex Chekholko
ill have an open terminal, and can still run anything I want, but the Grid has released the mem_free and slots for others to use. If anyone has advice on how I can set this up to accomplish my three goals, it would be very much appreciated. I can post any configuration / logs / details requir

Re: [gridengine users] HowTo Configure large mem MPI jobs to have priority over short running serial / smp jobs?

2015-05-04 Thread Alex Chekholko
quested) than single slot serial jobs. We have a fair share policy in place, but even with that, serial users starve out the large # of slot MPI users, especially large memory MPI jobs (ex: 64 slots... -- Alex Chekholko ch...@stanford.edu ___ users mailing

Re: [gridengine users] Jenkins integration?

2015-02-04 Thread Alex Chekholko
There is a general API called DRMAA http://en.wikipedia.org/wiki/DRMAA We have an example local app that uses it to track state of grid engine jobs: https://github.com/StanfordBioinformatics/SJM I found one result for "DRMAA Jenkins" in Google: http://biouno.org/2014/07/08/java-drmaa-api-part-

Re: [gridengine users] tcl jsv error - bash related

2014-11-06 Thread Alex Chekholko
On 11/6/14, 12:22 AM, William Hay wrote: On Wed, 5 Nov 2014 23:08:53 + Alex Chekholko wrote: So I have a bash env var called "BASH_FUNC_module()" and when the tcl jsv tries to make a TCL variable with that name, it errors out because of the parenthesis. I think this is a recent

[gridengine users] tcl jsv error - bash related

2014-11-05 Thread Alex Chekholko
_FUNC_module()" and when the tcl jsv tries to make a TCL variable with that name, it errors out because of the parenthesis. I think this is a recent change in bash due to the bash vulnerability fix and how bash functions are handled. As a workaround, we can tell users not to use the &

Re: [gridengine users] Basic information working on cluster

2014-04-15 Thread Alex Chekholko
_ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Grid engine does not send no job-mails

2013-10-24 Thread Alex Chekholko
tp://appmibio.bio.uni-goettingen.de/ --- ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mail

Re: [gridengine users] Resource Reservation logging

2013-10-04 Thread Alex Chekholko
gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Upgrading form SGE to OGE

2013-09-11 Thread Alex Chekholko
Normally you would leave the old install in place while the old jobs finish. Since you used a different communications port and different path, you can have the new daemons running alongside the old daemons and they don't know about each other. Direct users to submit jobs to the new environme

Re: [gridengine users] modify host selection algorithm?

2013-04-29 Thread Alex Chekholko
Hi, Right, you want: "queue_sort_method load" and "load_formula slots". Per Reuti's link. You can easily test by submitting some test jobs on an empty cluster and see that they all get assigned to the same host, leaving other hosts empty, until the host with the most slots used if filled up.

Re: [gridengine users] h_vmem not actually restricting memory usage?

2013-02-08 Thread Alex Chekholko
Hi Brett, I saw the issue with JOB vs YES when the jobs were requesting more than one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory limits when "JOB" was set. If your jobs are only requesting one complex, the JOB setting should work as expected, and you can try both way

Re: [gridengine users] strategy for h_vmem and multi-slot jobs?

2013-02-01 Thread Alex Chekholko
y. And 'qhost -F h_vmem' ends up showing negative values. So on clusters where our users request also h_stack in addition to h_vmem, we've had to set it back to 'YES' and have the users do the division by slots at job su

[gridengine users] h_vmem not honored at all?

2013-01-09 Thread Alex Chekholko
job only requests this one complex. Suggestions? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Memory allocation woes. Any thoughts?

2012-12-11 Thread Alex Chekholko
Hi Jake, You can do 'qhost -F h_vmem,mem_free,virtual_free', that might be a useful view for you. In general, I've only ever used one of the three complexes above. Which one(s) do you have defined for the execution hosts? e.g. qconf -se compute-1-7 h_vmem will map to 'ulimit -v' mem_free jus

Re: [gridengine users] h_vmem negative values?

2012-11-28 Thread Alex Chekholko
linux-x64 32 12.81 63.0G8.4G9.8G 27.8M Host Resource(s): hc:h_vmem=1.000G Is there anything I can do to diagnose this issue? On 10/31/12 3:19 PM, Dave Love wrote: Alex Chekholko writes: Hi Reuti, Thanks for your response, here's the output of 'qhost -F h_v

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
;t defined after the job already started? -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
12:48 PM, Reuti wrote: Am 19.10.2012 um 20:58 schrieb Alex Chekholko: qhost values seem fine: ... scg3-0-11 lx26-amd64 32 27.15 63.0G 38.3G9.8G 393.6M scg3-0-12 lx26-amd64 32 27.36 63.0G 38.7G9.8G 33.6M scg3-0-13 lx26-amd64 3

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
ayson On Thu, Oct 18, 2012 at 6:53 PM, Alex Chekholko wrote: Hi, Running Rocks 6, so whatever GE version is included there. h_vmem is set consumable and per job, 4G default: -bash-4.1$ qconf -sc |grep h_vmem h_vmem h_vmem MEMORY <=YES JOB 4G 0 eac

[gridengine users] h_vmem negative values?

2012-10-18 Thread Alex Chekholko
15 ... Maybe the sgeexecd needs to be cycled for the setting to take effect? I can try that next. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qlogin sets TERM var to "dumb"?

2012-10-18 Thread Alex Chekholko
On 10/18/12 3:14 AM, Reuti wrote: Am 28.09.2012 um 21:15 schrieb Alex Chekholko: Hi, I have a fresh install of Open Grid Scheduler on a new cluster and I see that when I do a qlogin session, the environment variable TERM gets set to "dumb". On my other clusters I see it gets set

Re: [gridengine users] qmaster: hard descriptor limit and soft descriptor limit?

2012-10-12 Thread Alex Chekholko
t its settings from limits.conf? I bet if you restart the process, it'll get the settings from your current root shell, and all will be well. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://griden

Re: [gridengine users] SGE with KRB5

2012-10-05 Thread Alex Chekholko
On 10/05/2012 04:09 AM, Dave Love wrote: Alex Chekholko writes: Hi Christoph, We do have it working with AUKS, which is mostly outside GE. Is that better than DESY's arcx system, or are they roughly equivalent? I can't remember why arcx seemed a better bet on a quick look. I

Re: [gridengine users] Limit jobs in a queue to those requesting a resource

2012-10-05 Thread Alex Chekholko
go to that queue instance. no jobs without testq=1 will go in that queue. relevant thread: http://gridengine.org/pipermail/users/2012-March/002972.html On 10/05/2012 03:36 PM, Orion Poplawski wrote: I'd like to create a queue that only allows jobs that have requested a specific resource re

Re: [gridengine users] KRB5 support

2012-10-05 Thread Alex Chekholko
that. I don't see why this would require a reinstall. You probably need to shut it all down and modify the bootstrap file, but that should be it. IIRC it depends on whether their GE binaries were compiled with Kerberos support or not. -- Alex Chekholko ch...@stanfor

Re: [gridengine users] SGE with KRB5

2012-10-02 Thread Alex Chekholko
ould be grateful for the links. Thanks in advance, Christoph Gesendet von meinem Windows Phone ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu __

[gridengine users] qlogin sets TERM var to "dumb"?

2012-09-28 Thread Alex Chekholko
must be some standard way to set that in an interactive job. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] 10GbE TOR's and performance tuning.

2012-09-13 Thread Alex Chekholko
I found a reference " Some Cisco switches are even permanently configured this way -- they can receive pause frames but never emit them." http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html On 09/13/2012 11:17 AM, Alex Chekholko wrote: 3. What of har

Re: [gridengine users] 10GbE TOR's and performance tuning.

2012-09-13 Thread Alex Chekholko
nd Dell Powerconnect 10GbE TOR's/interconnects, coupled with Broadcom CNA's in all our blade/node infrastructure. Thanks! --JC -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

[gridengine users] AMD 62xx performance was:Re: How to Restrict forking?

2012-09-12 Thread Alex Chekholko
d on this a bit when you get a chance? -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] debugging commlib errors?

2012-09-05 Thread Alex Chekholko
Hi, For a low-level gridengine test, you can use the 'qping' command between various grid engine daemons. I would guess that during the time that you see the commlib error, your qping also wouldn't work. E.g. from an exec host: qping -info 6444 qmaster 1 See the qping man page for more i

Re: [gridengine users] queues not erroring out when jobs error out

2012-09-04 Thread Alex Chekholko
12:03 PM, Rayson Ho wrote: Hi Alex, That's the correct behavior (for SSTATE_OPEN_OUTPUT), or else a user can DoS the cluster easily by pointing the input or output file to a path that can't be opened by the user. Rayson On Tue, Sep 4, 2012 at 2:50 PM, Alex Chekholko wrote: Hi,

[gridengine users] queues not erroring out when jobs error out

2012-09-04 Thread Alex Chekholko
reason that is considered specific to the queue." per http://arc.liv.ac.uk/SGE/howto/troubleshooting.html We also have a load sensor that checks for the presence of this filesystem, but the load sensor only updates every few minutes, while the filesystem tends to disappear for only about 60s

Re: [gridengine users] load_formula and PE jobs

2012-08-13 Thread Alex Chekholko
is nothing what you can do to change the behavior. -- Reuti Am 09.08.2012 um 21:23 schrieb Joseph Farran: Howdy. I am using GE2011.11. I am successfully using GE "load_formula" to load jobs by core count using my own "load_sensor"

Re: [gridengine users] qalter not successful

2012-06-13 Thread Alex Chekholko
r jobs to run on the node in question but it'll take a bit of ferkling Kevin ECS, VUW, NZ ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860 ___

Re: [gridengine users] Over-subscription of hosts -- the role of slots and queues

2012-06-05 Thread Alex Chekholko
ion), then all I have to do is change my queues so that they don't overlap. The fact that the problem doesn't come up with multiple parallel jobs may be because of load thresholds. What do you think of my solution? If it's nonsense, can anyone s

Re: [gridengine users] Requesting h_vmem or memfree

2012-05-25 Thread Alex Chekholko
ee of the host. The former is the hc:mem_free in output of 'qstat -F', the latter is mem_free in 'qconf -se'. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] "Packing" jobs on nodes v2

2012-05-11 Thread Alex Chekholko
yson's OGE 2011.11). Our goal is to pack the single-core jobs on as few nodes as possible, preserving slots on other nodes for multi-slot jobs. Can you describe the behaviour you want to see? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mai

Re: [gridengine users] Dynamic Resources?

2012-04-18 Thread Alex Chekholko
instead. e.g. http://gridscheduler.sourceforge.net/howto/loadsensor.html Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] MATLAB + SGE experts?

2012-01-19 Thread Alex Chekholko
ks.com/support/product/DM/installation/ver_current/ Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qacct in xml?

2011-11-23 Thread Alex Chekholko
get something with qstat (which has -xml option), qstat > doesn't play with finished jobs. > > thanks in advance for help > > > gerard > ___ > users mailing list > users@gridengine.org > https:

Re: [gridengine users] SC11 anyone?

2011-11-16 Thread Alex Chekholko
people use? > > See folks tomorrow, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! >-- Daniel Boone > ___ > users mailing list > users@gridengin

Re: [gridengine users] current status of Kerberos support (and maybe AFS)?

2011-10-03 Thread Alex Chekholko
th' when the job tries to access a file in AFS. I didn't write these, but I can try to answer any questions you might have. Good Luck, Mark Mark Suhovecky HPC System Administrator Center for Research Computing University of Notre Dame suhove...@nd.edu ___

[gridengine users] current status of Kerberos support (and maybe AFS)?

2011-09-30 Thread Alex Chekholko
link above will work with modern GE 6.2u5+? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] mem_free

2011-09-21 Thread Alex Chekholko
. As Reuti said, you should be requesting multiple processors with '-pe XX 2', not 'num_proc=2'. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue and Parallell Environments

2011-08-08 Thread Alex Chekholko
hm mpich mpi That's where you specify which parallel environments your queue definition has. -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Maximum memory for running process?

2011-08-02 Thread Alex Chekholko
Yes, think of it as a "high water mark" over the duration of the process. 6.2u5 has a bug in accounting that particular metric, but previous and later versions do not. - Original Message - From: "William Deegan" To: "Gridengine Users Group" Sent: Monday, August 1, 2011 4:41:41 PM Subj

Re: [gridengine users] Maximum job launch rate?

2011-04-26 Thread Alex Chekholko
On 04/26/2011 06:04 PM, William Deegan wrote: Greetings, Is there a way to set the maximum rate at which new jobs will be launched on a cluster (and/or a given machine)? My client's worried about submitting a 100 jobs and having them all start at the same time crushing the fileserver. One

Re: [gridengine users] PE batch job scheduled in interactive queue

2011-02-18 Thread Alex Chekholko
On 02/18/2011 10:25 AM, Chris Jewell wrote: Hi All, This feels like I ought to know better, but. I currently have two queues: "batch.q" which is configured to accept both batch and interactive jobs, and "interactive.q" which just accepts interactive jobs. Both are configured to use a PE