I have configured virtual_free as a requestable resource:
virtual_free mem MEMORY <= YES JOB 0
0
And it's been working great for months.
However today all of a sudden I got this error in messages:
07/25/2017 08:45:41|worker|ibm068|E|host load value "virtual_free" exceeded:
capacity is 95945748480.262146, job 5983416 requests additional
268000000000.000000
07/25/2017 08:45:41|worker|ibm068|E|cannot start job 5983416.1, as resources
have changed during a scheduling run
07/25/2017 08:45:41|worker|ibm068|W|Skipping remaining 7 orders
And any job would not get scheduled at all, they'd be in waiting state "qw", no
matter how many resources it's requesting:
# qstat -j 5983416
==============================================================
job_number: 5983416
exec_file: job_scripts/5983416
submission_time: Tue Jul 25 08:18:46 2017
owner: jumbo
uid: 986
group: memory
gid: 41
sge_o_home: /home/jumbo
sge_o_log_name: jumbo
sge_o_path:
/home/eda/cadence/IC616.500.3_20131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/ho
me/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda/cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/
usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:/home/IPproj/IOproject/quan/Flatten
sge_o_shell: /bin/csh
sge_o_workdir: /home/memorytemp/jumbo/180G_RK/S018DP/design_review
sge_o_host: ibm041
account: sge
cwd: /home/memorytemp/jumbo/180G_RK/S018DP/design_review
merge: y
hard resource_list: virtual_free=2000m
mail_list: jumbo@ibm041
notify: FALSE
job_name: run.pl
jobshare: 0
hard_queue_list: 256g.q
env_list:
REMOTEHOST=dsls11,MANPATH=/home/sge/sge6.2u6/man:/opt/SUNWspro/man:/usr/man:/usr/openwin/man:/usr/dt/man:/
usr/local/man:/usr/local/mysql/man:/usr/local/samba/man,VNCDESKTOP=ibm041:344
(jumbo),HOSTNAME=ibm041,HOST=ibm041,SHELL=/bin/csh,TERM=
xterm,GROUP=memory,USER=jumbo,LD_LIBRARY_PATH=/usr/lib:/usr/openwin/lib:/usr/dt/lib:/usr/ccs/lib:/usr/local/lib:/usr/local/mysql/lib,L
S_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.
exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip
=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*
.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,HOSTTYPE=x86_64-linux,MAIL=/var/spool/mail/jumbo,PATH=/home/eda/cadence/IC616.500.3_20
131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/home/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda
/cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:
/home/IPproj/IOproject/quan/Flatten,INPUTRC=/etc/inputrc,PWD=/home/memorytemp/jumbo/180G_RK/S018DP/design_review,EDITOR=xterm
-e vi,LA
NG=en_US.UTF-8,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=6,HOME=/home/jumbo,OSTYPE=linux,VENDOR=unknown,MACHTYPE=x86_64
,LOGNAME=jumbo,LESSOPEN=|/usr/bin/lesspipe.sh
%s,DISPLAY=:344.0,G_BROKEN_FILENAMES=1,_=/usr/bin/gnome-session,GTK_RC_FILES=/etc/gtk/gt
krc:/home/jumbo/.gtkrc-1.2-gnome2,SESSION_MANAGER=local/ibm041:/tmp/.ICE-unix/17118,GNOME_KEYRING_SOCKET=/tmp/keyring-FJMO4E/socket,GN
OME_DESKTOP_SESSION_ID=Default,DESKTOP_STARTUP_ID=NONE,COLORTERM=gnome-terminal,WINDOWID=38263354,SGE_ROOT=/home/sge/sge6.2u6,SGE_CELL
=cell1,SGE_CLUSTER_NAME=p5098,IC61=/home/eda/cadence/IC616.500.3_20131102,MMSIMHOME=/home/eda/cadence/Spectre161ISR2,LM_LICENSE_FILE=5
280@ibm041:5280@ibm001:5280@ibm002:5280@ibm003:5260@cadlic:5280@cadlic:5280@dsw3:5280@dsw7:5280@ibm004:5280@ibm005:5280@ibm006:5280@10
.224.172.252
script_file: ./run.pl
scheduling info: queue instance "gui.q@dsbm05" dropped because it is
overloaded: mem_used=269814435839.737854 (no load adju stment) >= 200g
queue instance "192g.q@dsbm10" dropped because it
is temporarily not available
queue instance "gui.q@dsbm10" dropped because it is
temporarily not available
queue instance "gui.q@dsbm10" dropped because it is
temporarily not available
And clearly there are available resources:
# qstat -F mem
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
gmig.q@ibm044 BIP 0/0/2 1.27 lx24-amd64
hc:virtual_free=24.000G
---------------------------------------------------------------------------------
gui.q@dsbm04 BIP 0/59/70 10.01 lx24-amd64
hc:virtual_free=256.000G
---------------------------------------------------------------------------------
gui.q@dsbm05 BIP 0/56/70 7.14 lx24-amd64 a
hc:virtual_free=90.705G
---------------------------------------------------------------------------------
gui.q@dsbm08 BIP 0/11/45 9.96 lx24-amd64
hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm09 BIP 0/7/45 9.84 lx24-amd64
hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm10 BIP 0/2/45 0.82 lx24-amd64 o
hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm11 BIP 0/41/45 3.13 lx24-amd64
hc:virtual_free=192.000G
---------------------------------------------------------------------------------
lc.q@ibm071 BIP 0/0/50 0.21 lx24-amd64
hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm072 BIP 0/0/50 0.00 lx24-amd64
hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm073 BIP 0/0/50 24.09 lx24-amd64
hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm074 BIP 0/5/50 0.05 lx24-amd64
hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm075 BIP 0/0/50 24.43 lx24-amd64
hc:virtual_free=48.000G
Not sure what happened there. I had to disable this complex, so now jobs are
being scheduled again. I wonder if there was one job that was submitted
improperly that caused this?
________________________________
This email (including its attachments, if any) may be confidential and
proprietary information of SMIC, and intended only for the use of the named
recipient(s) above. Any unauthorized use or disclosure of this email is
strictly prohibited. If you are not the intended recipient(s), please notify
the sender immediately and delete this email from your computer.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users