Am 16.12.2016 um 03:54 schrieb John_Tai:
> I have pinpointed the problem, but I don’t know how to solve it.
>
> It looks like hosts with complex virtual_free cannot run jobs that require
> PE, even though the virtual_free complex was not requested by the job. I set
> the virtual_free to allow jobs to request RAM, so the goal is the each job to
> request both RAM and number of cpu cores. Hopefully this helps figuring out a
> solution. Thanks.
How does the definition of the complex look like in `qconf -sc`?
-- Reuti
> Here’s an example of one host that doesn’t work:
>
> # qconf -se ibm038
> hostname ibm038
> load_scaling NONE
> complex_values virtual_free=16G
>
> # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm038 xclock
> Your job 143 ("xclock") has been submitted
> # qstat -j 143
> ==============================================================
> job_number: 143
> exec_file: job_scripts/143
> submission_time: Fri Dec 16 10:46:02 2016
> owner: johnt
> uid: 162
> group: sa
> gid: 4563
> sge_o_home: /home/johnt
> sge_o_log_name: johnt
> sge_o_path:
> /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.
> sge_o_shell: /bin/tcsh
> sge_o_workdir: /home/johnt/sge8
> sge_o_host: ibm005
> account: sge
> cwd: /home/johnt/sge8
> mail_list: johnt@ibm005
> notify: FALSE
> job_name: xclock
> jobshare: 0
> hard_queue_list: all.q@ibm038
> env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME= [..]
> script_file: xclock
> parallel environment: cores range: 7
> binding: NONE
> job_type: binary
> scheduling info: cannot run in PE "cores" because it only offers 0
> slots
>
> Here’s an example of a host that does work:
>
> # qconf -se ibm037
> hostname ibm037
> load_scaling NONE
> complex_values NONE
>
> # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm037 xclock
> Your job 144 ("xclock") has been submitted
> # qstat -j 144
> ==============================================================
> job_number: 144
> exec_file: job_scripts/144
> submission_time: Fri Dec 16 10:49:35 2016
> owner: johnt
> uid: 162
> group: sa
> gid: 4563
> sge_o_home: /home/johnt
> sge_o_log_name: johnt
> sge_o_path:
> /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.
> sge_o_shell: /bin/tcsh
> sge_o_workdir: /home/johnt/sge8
> sge_o_host: ibm005
> account: sge
> cwd: /home/johnt/sge8
> mail_list: johnt@ibm005
> notify: FALSE
> job_name: xclock
> jobshare: 0
> hard_queue_list: all.q@ibm037
> env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/johnt
> [..]
> script_file: xclock
> parallel environment: cores range: 7
> binding: NONE
> job_type: binary
> usage 1: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB,
> vmem=N/A, maxvmem=N/A
> binding 1: NONE
>
>
>
> From: [email protected] [mailto:[email protected]] On
> Behalf Of John_Tai
> Sent: Wednesday, December 14, 2016 3:52
> To: Christopher Heiny
> Cc: [email protected]; Coleman, Marcus [JRDUS Non-J&J]
> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...)
>
> I’m actually using sge8.1.9-1 for all. Is there a problem with that?
> Downloaded here:
>
> http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/
>
>
>
>
>
> From: Christopher Heiny [mailto:[email protected]]
> Sent: Wednesday, December 14, 2016 3:26
> To: John_Tai
> Cc: [email protected]; Coleman, Marcus [JRDUS Non-J&J]; Reuti; Christopher
> Heiny
> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...)
>
>
>
> On Dec 13, 2016 7:04 PM, "John_Tai" <[email protected]> wrote:
> I have 3 hosts in all.q, it seems the 2 servers running RHEL5.3 (ibm037,
> ibm038) do not work with PE, while the server with RHEL6.8 (ibm021) is
> working ok. Their conf are identical:
>
>
> Hmmmm. Might be a Grid Engine version mismatch issue. If you installed from
> RH rpms, then I think EL5.3 is on 6.1u4 and EL6.8 is on 6.2u3 or 6.2u5.
>
>
>
>
>
> # qconf -sq all.q@ibm038
> qname all.q
> hostname ibm038
> seq_no 0
> load_thresholds np_load_avg=1.75
> suspend_thresholds NONE
> nsuspend 1
> suspend_interval 00:05:00
> priority 0
> min_cpu_interval 00:05:00
> processors UNDEFINED
> qtype BATCH INTERACTIVE
> ckpt_list NONE
> pe_list cores
> rerun FALSE
> slots 8
> tmpdir /tmp
> shell /bin/sh
> prolog NONE
> epilog NONE
> shell_start_mode posix_compliant
> starter_method NONE
> suspend_method NONE
> resume_method NONE
> terminate_method NONE
> notify 00:00:60
> owner_list NONE
> user_lists NONE
> xuser_lists NONE
> subordinate_list NONE
> complex_values NONE
> projects NONE
> xprojects NONE
> calendar NONE
> initial_state default
> s_rt INFINITY
> h_rt INFINITY
> s_cpu INFINITY
> h_cpu INFINITY
> s_fsize INFINITY
> h_fsize INFINITY
> s_data INFINITY
> h_data INFINITY
> s_stack INFINITY
> h_stack INFINITY
> s_core INFINITY
> h_core INFINITY
> s_rss INFINITY
> h_rss INFINITY
> s_vmem INFINITY
> h_vmem INFINITY
>
>
>
>
>
> -----Original Message-----
> From: Christopher Heiny [mailto:[email protected]]
> Sent: Wednesday, December 14, 2016 10:21
> To: John_Tai; Reuti
> Cc: Coleman, Marcus [JRDUS Non-J&J]; [email protected]
> Subject: Re: John's cores pe (Was: users Digest...)
>
> On Wed, 2016-12-14 at 02:03 +0000, John_Tai wrote:
> > I switched schedd_job_info to true, these are the outputs you
> > requested:
> >
> >
> >
> > # qstat -j 95
> > ==============================================================
> > job_number: 95
> > exec_file: job_scripts/95
> > submission_time: Tue Dec 13 08:50:34 2016
> > owner: johnt
> > uid: 162
> > group: sa
> > gid: 4563
> > sge_o_home: /home/johnt
> > sge_o_log_name: johnt
> > sge_o_path: /home/sge/sge8.1.9-
> > 1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-
> > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.
> > sge_o_shell: /bin/tcsh
> > sge_o_workdir: /home/johnt/sge8
> > sge_o_host: ibm005
> > account: sge
> > cwd: /home/johnt/sge8
> > mail_list: johnt@ibm005
> > notify: FALSE
> > job_name: xclock
> > jobshare: 0
> > hard_queue_list: all.q@ibm038
> > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/
> > johnt,SHELL=/bin/tcsh,USER=johnt,LOGNAME=johnt,PATH=/home/sge/sge8.1.
> > 9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-
> > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.,H
> > OSTTYPE=x86_64-
> > linux,VENDOR=unknown,OSTYPE=linux,MACHTYPE=x86_64,SHLVL=1,PWD=/home/j
> > ohnt/sge8,GROUP=sa,HOST=ibm005,REMOTEHOST=dsls11,MAIL=/var/spool/mail
> > /johnt,LS_COLORS=no=00:fi=00:di=00;36:ln=00;34:pi=40;33:so=01;31:bd=4
> > 0;33:cd=40;33:or=40;31:ex=00;31:*.tar=00;33:*.tgz=00;33:*.zip=00;33:*
> > .bz2=00;33:*.z=00;33:*.Z=00;33:*.gz=00;33:*.ev=00;41,G_BROKEN_FILENAM
> > ES=1,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-
> > askpass,KDE_IS_PRELINKED=1,KDEDIR=/usr,LANG=en_US.UTF-
> > 8,LESSOPEN=|/usr/bin/lesspipe.sh
> > %s,HOSTNAME=ibm005,INPUTRC=/etc/inputrc,ASSURA_AUTO_64BIT=NONE,EDITOR
> > =vi,TOP=-ores
> > 60,CVSROOT=/home/edamgr/CVSTF,OPERA_PLUGIN_PATH=/usr/java/jre1.5.0_01
> > /plugin/i386/ns7,NPX_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/ns
> > 7,MANPATH=/home/sge/sg!
> > e8.1.9-
> > 1.el5/man:/usr/share/man:/usr/X11R6/man:/usr/kerberos/man,LD_LIBRARY_
> > PATH=/usr/lib:/usr/local/lib:/usr/lib64:/usr/local/lib64,MGC_HOME=/ho
> > me/eda/mentor/aoi_cal_2015.3_25.16,CALIBRE_LM_LOG_LEVEL=WARN,MGLS_LIC
> > ENSE_FILE=1717@ibm004:1717@ibm005:1717@ibm041:1717@ibm042:1717@ibm043
> > :1717@ibm033:1717@ibm044:1717@td156:1717@td158:1717@ATD222,MGC_CALGUI
> > _RELEASE_LICENSE_TIME=0.5,MGC_RVE_RELEASE_LICENSE_TIME=0.5,SOSCAD=/ca
> > d,EDA_TOOL_SETUP_ROOT=/cad/toolSetup,EDA_TOOL_SETUP_VERSION=1.0,SGE_R
> > OOT=/home/sge/sge8.1.9-1.el5,SGE_ARCH=lx-
> > amd64,SGE_CELL=cell2,SGE_CLUSTER_NAME=p6444,SGE_QMASTER_PORT=6444,SGE
> > _EXECD_PORT=6445,DRMAA_LIBRARY_PATH=/home/sge/sge8.1.9-
> > 1.el5/lib//libdrmaa.so
> > script_file: xclock
> > parallel environment: cores range: 1
> > binding: NONE
> > job_type: binary
> > scheduling info: cannot run in queue "pc.q" because it is
> > not contained in its hard queue list (-q)
> > cannot run in queue "sim.q" because it is
> > not contained in its hard queue list (-q)
> > cannot run in queue "all.q@ibm021"
> > because it is not contained in its hard queue list (-q)
> > cannot run in PE "cores" because it only
> > offers 0 slots
>
> Hmmmm. Just a wild idea, but I'm thinking maybe there's something wacky
> about ibm038's particular configuration. What does
> qconf -sq all.q@ibm038
> say?
>
> And what happens if you use this qsub command?
> qsub -V -b y -cwd -now n -pe cores 2 -q all.q xclock
>
> Cheers,
> Chris
>
>
> ________________________________
>
> This email (including its attachments, if any) may be confidential and
> proprietary information of SMIC, and intended only for the use of the named
> recipient(s) above. Any unauthorized use or disclosure of this email is
> strictly prohibited. If you are not the intended recipient(s), please notify
> the sender immediately and delete this email from your computer.
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
> This email (including its attachments, if any) may be confidential and
> proprietary information of SMIC, and intended only for the use of the named
> recipient(s) above. Any unauthorized use or disclosure of this email is
> strictly prohibited. If you are not the intended recipient(s), please notify
> the sender immediately and delete this email from your computer.
>
> This email (including its attachments, if any) may be confidential and
> proprietary information of SMIC, and intended only for the use of the named
> recipient(s) above. Any unauthorized use or disclosure of this email is
> strictly prohibited. If you are not the intended recipient(s), please notify
> the sender immediately and delete this email from your computer.
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users