s from overbooking a node.
Thank you,
Nico
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
the built in debugging (man sge_dl).
William
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
___
users mail
OK, we figured it out, the user has an orphaned tmux session which has a
shell in it with the command
watch qdel *
On 11/30/2015 04:27 PM, Alex Chekholko wrote:
Hi,
My qmaster messages log is continuously printing, every 2s:
11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s
Hi,
My qmaster messages log is continuously printing, every 2s:
11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:30|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:32|worker|scg3-hn01|E|The job * of user(s) yang
ents of this e-mail are the
views of the sender and do not necessarily represent the views of the
Babraham Institute. Full conditions at: www.babraham.ac.uk
<http://www.babraham.ac.uk/terms>
___
users mail
Hi,
I see in our config (qconf -sconf)
qmaster_params MAX_DYN_EC=500 gdi_timeout=120 gdi_retries=4
Have you tried fiddling with the MAX_DYN_EC parameter?
Regards,
Alex
On 06/12/2015 02:44 PM, Grau, Federico (NIH/NHGRI) [C] wrote:
Hello gridengine users,
Our organization has be
attachments. WARNING:
Computer viruses can be transmitted via email. The recipient should
check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus
transmitted by this email. www.wipro.com
--
Alex Chekholko ch
og
any thoughts?
Dan
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
On 06/08/2015 12:49 AM, William Hay wrote:
On Fri, 5 Jun 2015 22:16:12 +
Alex Chekholko wrote:
Hi all,
I have a standard grid engine cluster (sge-8.1.8 tarball from Dave
Love's site) where users use qlogin to get interactive shells on compute
nodes, and we use a qlogin wrapper scri
The information transmitted in this email is directed only to the addressee. If you
received this in error, please contact the sender and delete this email from your
system. The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Babraham Institute. Full conditions at:
www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ite understand what
https://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html
is for, no matter how many times I re-read those man pages. Do I need
both pam_sge-qrsh-setup and pam_sge_authorize?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
ill have an open
terminal, and can still run anything I want, but the Grid has released
the mem_free and slots for others to use.
If anyone has advice on how I can set this up to accomplish my three
goals, it would be very much appreciated. I can post any configuration /
logs / details requir
quested) than single slot serial jobs. We have
a fair share policy in place, but even with that, serial users starve
out the large # of slot MPI users, especially large memory MPI jobs (ex:
64 slots...
--
Alex Chekholko ch...@stanford.edu
___
users mailing
There is a general API called DRMAA
http://en.wikipedia.org/wiki/DRMAA
We have an example local app that uses it to track state of grid engine
jobs:
https://github.com/StanfordBioinformatics/SJM
I found one result for "DRMAA Jenkins" in Google:
http://biouno.org/2014/07/08/java-drmaa-api-part-
On 11/6/14, 12:22 AM, William Hay wrote:
On Wed, 5 Nov 2014 23:08:53 +
Alex Chekholko wrote:
So I have a bash env var called "BASH_FUNC_module()" and when the tcl
jsv tries to make a TCL variable with that name, it errors out because
of the parenthesis. I think this is a recent
_FUNC_module()" and when the tcl
jsv tries to make a TCL variable with that name, it errors out because
of the parenthesis. I think this is a recent change in bash due to the
bash vulnerability fix and how bash functions are handled.
As a workaround, we can tell users not to use the &
_
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
tp://appmibio.bio.uni-goettingen.de/
---
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mail
gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Normally you would leave the old install in place while the old jobs finish.
Since you used a different communications port and different path, you
can have the new daemons running alongside the old daemons and they
don't know about each other.
Direct users to submit jobs to the new environme
Hi,
Right, you want:
"queue_sort_method load" and "load_formula slots".
Per Reuti's link.
You can easily test by submitting some test jobs on an empty cluster and
see that they all get assigned to the same host, leaving other hosts
empty, until the host with the most slots used if filled up.
Hi Brett,
I saw the issue with JOB vs YES when the jobs were requesting more than
one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory
limits when "JOB" was set.
If your jobs are only requesting one complex, the JOB setting should
work as expected, and you can try both way
y. And 'qhost -F h_vmem' ends up
showing negative values.
So on clusters where our users request also h_stack in addition to
h_vmem, we've had to set it back to 'YES' and have the users do the
division by slots at job su
job only requests this
one complex.
Suggestions?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Hi Jake,
You can do 'qhost -F h_vmem,mem_free,virtual_free', that might be a
useful view for you.
In general, I've only ever used one of the three complexes above.
Which one(s) do you have defined for the execution hosts? e.g.
qconf -se compute-1-7
h_vmem will map to 'ulimit -v'
mem_free jus
linux-x64 32 12.81 63.0G8.4G9.8G
27.8M
Host Resource(s): hc:h_vmem=1.000G
Is there anything I can do to diagnose this issue?
On 10/31/12 3:19 PM, Dave Love wrote:
Alex Chekholko writes:
Hi Reuti,
Thanks for your response, here's the output of 'qhost -F h_v
;t defined after the job already started?
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
12:48 PM, Reuti wrote:
Am 19.10.2012 um 20:58 schrieb Alex Chekholko:
qhost values seem fine:
...
scg3-0-11 lx26-amd64 32 27.15 63.0G 38.3G9.8G 393.6M
scg3-0-12 lx26-amd64 32 27.36 63.0G 38.7G9.8G 33.6M
scg3-0-13 lx26-amd64 3
ayson
On Thu, Oct 18, 2012 at 6:53 PM, Alex Chekholko wrote:
Hi,
Running Rocks 6, so whatever GE version is included there.
h_vmem is set consumable and per job, 4G default:
-bash-4.1$ qconf -sc |grep h_vmem
h_vmem h_vmem MEMORY <=YES JOB 4G 0
eac
15
...
Maybe the sgeexecd needs to be cycled for the setting to take effect? I
can try that next.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
On 10/18/12 3:14 AM, Reuti wrote:
Am 28.09.2012 um 21:15 schrieb Alex Chekholko:
Hi,
I have a fresh install of Open Grid Scheduler on a new cluster and I see that when I do a qlogin session, the
environment variable TERM gets set to "dumb". On my other clusters I see it gets set
t its
settings from limits.conf?
I bet if you restart the process, it'll get the settings from your
current root shell, and all will be well.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://griden
On 10/05/2012 04:09 AM, Dave Love wrote:
Alex Chekholko writes:
Hi Christoph,
We do have it working with AUKS, which is mostly outside GE.
Is that better than DESY's arcx system, or are they roughly equivalent?
I can't remember why arcx seemed a better bet on a quick look.
I
go to that queue instance. no jobs without testq=1 will go in that
queue.
relevant thread:
http://gridengine.org/pipermail/users/2012-March/002972.html
On 10/05/2012 03:36 PM, Orion Poplawski wrote:
I'd like to create a queue that only allows jobs that have requested a
specific resource re
that.
I don't see why this would require a reinstall. You probably need to
shut it all down and modify the bootstrap file, but that should be it.
IIRC it depends on whether their GE binaries were compiled with Kerberos
support or not.
--
Alex Chekholko ch...@stanfor
ould be
grateful for the links.
Thanks in advance,
Christoph
Gesendet von meinem Windows Phone
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
__
must be some standard way to set that in an interactive job.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
I found a reference " Some Cisco switches are even permanently
configured this way -- they can receive pause frames but never emit them."
http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
On 09/13/2012 11:17 AM, Alex Chekholko wrote:
3. What of har
nd Dell Powerconnect 10GbE TOR's/interconnects, coupled with Broadcom
CNA's in all our blade/node infrastructure.
Thanks!
--JC
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
d on this a bit when you get a chance?
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Hi,
For a low-level gridengine test, you can use the 'qping' command between
various grid engine daemons.
I would guess that during the time that you see the commlib error, your
qping also wouldn't work.
E.g. from an exec host:
qping -info 6444 qmaster 1
See the qping man page for more i
12:03 PM, Rayson Ho wrote:
Hi Alex,
That's the correct behavior (for SSTATE_OPEN_OUTPUT), or else a user
can DoS the cluster easily by pointing the input or output file to a
path that can't be opened by the user.
Rayson
On Tue, Sep 4, 2012 at 2:50 PM, Alex Chekholko wrote:
Hi,
reason that is considered specific to the queue." per
http://arc.liv.ac.uk/SGE/howto/troubleshooting.html
We also have a load sensor that checks for the presence of this
filesystem, but the load sensor only updates every few minutes, while
the filesystem tends to disappear for only about 60s
is nothing what you can do
to change the
behavior.
-- Reuti
Am 09.08.2012 um 21:23 schrieb Joseph Farran:
Howdy.
I am using GE2011.11.
I am successfully using GE "load_formula" to load jobs by
core count
using my own "load_sensor"
r jobs to run on the node in
question but it'll
take a bit of ferkling
Kevin
ECS, VUW, NZ
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
___
ion), then all I have to do is change my queues so
that they don't overlap. The fact that the problem doesn't come up with
multiple parallel jobs may be because of load thresholds.
What do you think of my solution? If it's nonsense, can anyone s
ee of
the host. The former is the hc:mem_free in output of 'qstat -F', the
latter is mem_free in 'qconf -se'.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
yson's OGE 2011.11). Our goal is to pack the single-core jobs on as
few nodes as possible, preserving slots on other nodes for multi-slot jobs.
Can you describe the behaviour you want to see?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mai
instead.
e.g.
http://gridscheduler.sourceforge.net/howto/loadsensor.html
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ks.com/support/product/DM/installation/ver_current/
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
get something with qstat (which has -xml option), qstat
> doesn't play with finished jobs.
>
> thanks in advance for help
>
>
> gerard
> ___
> users mailing list
> users@gridengine.org
> https:
people use?
>
> See folks tomorrow,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>-- Daniel Boone
> ___
> users mailing list
> users@gridengin
th'
when the job tries to access a file in AFS.
I didn't write these, but I can try to answer any questions you might have.
Good Luck,
Mark
Mark Suhovecky
HPC System Administrator
Center for Research Computing
University of Notre Dame
suhove...@nd.edu
___
link above will
work with modern GE 6.2u5+?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
.
As Reuti said, you should be requesting multiple processors with '-pe XX
2', not 'num_proc=2'.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
hm mpich mpi
That's where you specify which parallel environments your queue
definition has.
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Yes, think of it as a "high water mark" over the duration of the process.
6.2u5 has a bug in accounting that particular metric, but previous and later
versions do not.
- Original Message -
From: "William Deegan"
To: "Gridengine Users Group"
Sent: Monday, August 1, 2011 4:41:41 PM
Subj
On 04/26/2011 06:04 PM, William Deegan wrote:
Greetings,
Is there a way to set the maximum rate at which new jobs will be
launched on a cluster (and/or a given machine)?
My client's worried about submitting a 100 jobs and having them all
start at the same time crushing the fileserver.
One
On 02/18/2011 10:25 AM, Chris Jewell wrote:
Hi All,
This feels like I ought to know better, but. I currently have two queues: "batch.q" which is
configured to accept both batch and interactive jobs, and "interactive.q" which just accepts
interactive jobs. Both are configured to use a PE
59 matches
Mail list logo