I have configured virtual_free as a requestable resource:

virtual_free        mem        MEMORY      <=    YES         JOB        0       
 0

And it's been working great for months.

However today all of a sudden I got this error in messages:

07/25/2017 08:45:41|worker|ibm068|E|host load value "virtual_free" exceeded: 
capacity is 95945748480.262146, job 5983416 requests additional 
268000000000.000000
07/25/2017 08:45:41|worker|ibm068|E|cannot start job 5983416.1, as resources 
have changed during a scheduling run
07/25/2017 08:45:41|worker|ibm068|W|Skipping remaining 7 orders

And any job would not get scheduled at all, they'd be in waiting state "qw", no 
matter how many resources it's requesting:

# qstat -j 5983416
==============================================================
job_number:                 5983416
exec_file:                  job_scripts/5983416
submission_time:            Tue Jul 25 08:18:46 2017
owner:                      jumbo
uid:                        986
group:                      memory
gid:                        41
sge_o_home:                 /home/jumbo
sge_o_log_name:             jumbo
sge_o_path:                 
/home/eda/cadence/IC616.500.3_20131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/ho
 
me/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda/cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/
 
usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:/home/IPproj/IOproject/quan/Flatten
sge_o_shell:                /bin/csh
sge_o_workdir:              /home/memorytemp/jumbo/180G_RK/S018DP/design_review
sge_o_host:                 ibm041
account:                    sge
cwd:                        /home/memorytemp/jumbo/180G_RK/S018DP/design_review
merge:                      y
hard resource_list:         virtual_free=2000m
mail_list:                  jumbo@ibm041
notify:                     FALSE
job_name:                   run.pl
jobshare:                   0
hard_queue_list:            256g.q
env_list:                   
REMOTEHOST=dsls11,MANPATH=/home/sge/sge6.2u6/man:/opt/SUNWspro/man:/usr/man:/usr/openwin/man:/usr/dt/man:/
 usr/local/man:/usr/local/mysql/man:/usr/local/samba/man,VNCDESKTOP=ibm041:344 
(jumbo),HOSTNAME=ibm041,HOST=ibm041,SHELL=/bin/csh,TERM= 
xterm,GROUP=memory,USER=jumbo,LD_LIBRARY_PATH=/usr/lib:/usr/openwin/lib:/usr/dt/lib:/usr/ccs/lib:/usr/local/lib:/usr/local/mysql/lib,L
 
S_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.
 
exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip
 
=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*
 
.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,HOSTTYPE=x86_64-linux,MAIL=/var/spool/mail/jumbo,PATH=/home/eda/cadence/IC616.500.3_20
 
131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/home/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda
 
/cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:
 
/home/IPproj/IOproject/quan/Flatten,INPUTRC=/etc/inputrc,PWD=/home/memorytemp/jumbo/180G_RK/S018DP/design_review,EDITOR=xterm
 -e vi,LA 
NG=en_US.UTF-8,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=6,HOME=/home/jumbo,OSTYPE=linux,VENDOR=unknown,MACHTYPE=x86_64
 ,LOGNAME=jumbo,LESSOPEN=|/usr/bin/lesspipe.sh 
%s,DISPLAY=:344.0,G_BROKEN_FILENAMES=1,_=/usr/bin/gnome-session,GTK_RC_FILES=/etc/gtk/gt
 
krc:/home/jumbo/.gtkrc-1.2-gnome2,SESSION_MANAGER=local/ibm041:/tmp/.ICE-unix/17118,GNOME_KEYRING_SOCKET=/tmp/keyring-FJMO4E/socket,GN
 
OME_DESKTOP_SESSION_ID=Default,DESKTOP_STARTUP_ID=NONE,COLORTERM=gnome-terminal,WINDOWID=38263354,SGE_ROOT=/home/sge/sge6.2u6,SGE_CELL
 
=cell1,SGE_CLUSTER_NAME=p5098,IC61=/home/eda/cadence/IC616.500.3_20131102,MMSIMHOME=/home/eda/cadence/Spectre161ISR2,LM_LICENSE_FILE=5
 
280@ibm041:5280@ibm001:5280@ibm002:5280@ibm003:5260@cadlic:5280@cadlic:5280@dsw3:5280@dsw7:5280@ibm004:5280@ibm005:5280@ibm006:5280@10
 .224.172.252
script_file:                ./run.pl
scheduling info:            queue instance "gui.q@dsbm05" dropped because it is 
overloaded: mem_used=269814435839.737854 (no load adju stment) >= 200g
                            queue instance "192g.q@dsbm10" dropped because it 
is temporarily not available
                            queue instance "gui.q@dsbm10" dropped because it is 
temporarily not available
                            queue instance "gui.q@dsbm10" dropped because it is 
temporarily not available


And clearly there are available resources:



# qstat -F mem
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
gmig.q@ibm044                  BIP   0/0/2          1.27     lx24-amd64
        hc:virtual_free=24.000G
---------------------------------------------------------------------------------
gui.q@dsbm04                   BIP   0/59/70        10.01    lx24-amd64
        hc:virtual_free=256.000G
---------------------------------------------------------------------------------
gui.q@dsbm05                   BIP   0/56/70        7.14     lx24-amd64    a
        hc:virtual_free=90.705G
---------------------------------------------------------------------------------
gui.q@dsbm08                   BIP   0/11/45        9.96     lx24-amd64
        hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm09                   BIP   0/7/45         9.84     lx24-amd64
        hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm10                   BIP   0/2/45         0.82     lx24-amd64    o
        hc:virtual_free=192.000G
---------------------------------------------------------------------------------
gui.q@dsbm11                   BIP   0/41/45        3.13     lx24-amd64
        hc:virtual_free=192.000G
---------------------------------------------------------------------------------
lc.q@ibm071                    BIP   0/0/50         0.21     lx24-amd64
        hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm072                    BIP   0/0/50         0.00     lx24-amd64
        hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm073                    BIP   0/0/50         24.09    lx24-amd64
        hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm074                    BIP   0/5/50         0.05     lx24-amd64
        hc:virtual_free=48.000G
---------------------------------------------------------------------------------
lc.q@ibm075                    BIP   0/0/50         24.43    lx24-amd64
        hc:virtual_free=48.000G


Not sure what happened there. I had to disable this complex, so now jobs are 
being scheduled again. I wonder if there was one job that was submitted 
improperly that caused this?


________________________________
This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to