>> Are they all failing to start on the same host?  Might be worth disabling 
>> the queues on that host so the scheduler looks for another place to put it.  
>> Have a look at the host to see if something is eating virtual memory there.

No, it wasn't one particular host or queue. I even created a new queue, but I 
still couldn't submit a new job. From the qstat -F output all hosts had enough 
free virtual memory.


-----Original Message-----
From: William Hay [mailto:[email protected]]
Sent: Wednesday, July 26, 2017 4:18
To: John_Tai
Cc: [email protected]
Subject: Re: [gridengine users] complex error

On Tue, Jul 25, 2017 at 12:57:47AM +0000, John_Tai wrote:
>    I have configured virtual_free as a requestable resource:
>
>
>
>    virtual_free        mem        MEMORY      <=    YES         JOB
>    0        0
>
>
>
>    And it's been working great for months.
>
>
>
>    However today all of a sudden I got this error in messages:
>
>
>
>    07/25/2017 08:45:41|worker|ibm068|E|host load value "virtual_free"
>    exceeded: capacity is 95945748480.262146, job 5983416 requests additional
>    268000000000.000000
>
>    07/25/2017 08:45:41|worker|ibm068|E|cannot start job 5983416.1, as
>    resources have changed during a scheduling run
>
>    07/25/2017 08:45:41|worker|ibm068|W|Skipping remaining 7 orders
>
>
>
>    And any job would not get scheduled at all, they'd be in waiting state
>    "qw", no matter how many resources it's requesting:
Are they all failing to start on the same host?  Might be worth disabling the 
queues on that host so the scheduler looks for another place to put it.  Have a 
look at the host to see if something is eating virtual memory there.

William

>
>
>
>    # qstat -j 5983416
>
>    ==============================================================
>
>    job_number:                 5983416
>
>    exec_file:                  job_scripts/5983416
>
>    submission_time:            Tue Jul 25 08:18:46 2017
>
>    owner:                      jumbo
>
>    uid:                        986
>
>    group:                      memory
>
>    gid:                        41
>
>    sge_o_home:                 /home/jumbo
>
>    sge_o_log_name:             jumbo
>
>    sge_o_path:
>    
> /home/eda/cadence/IC616.500.3_20131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/ho
>    
> me/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda/cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/
>
> usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:/home/IPpr
> oj/IOproject/quan/Flatten
>
>    sge_o_shell:                /bin/csh
>
>    sge_o_workdir:
>    /home/memorytemp/jumbo/180G_RK/S018DP/design_review
>
>    sge_o_host:                 ibm041
>
>    account:                    sge
>
>    cwd:
>    /home/memorytemp/jumbo/180G_RK/S018DP/design_review
>
>    merge:                      y
>
>    hard resource_list:         virtual_free=2000m
>
>    mail_list:                  jumbo@ibm041
>
>    notify:                     FALSE
>
>    job_name:                   run.pl
>
>    jobshare:                   0
>
>    hard_queue_list:            256g.q
>
>    env_list:
>    
> REMOTEHOST=dsls11,MANPATH=/home/sge/sge6.2u6/man:/opt/SUNWspro/man:/usr/man:/usr/openwin/man:/usr/dt/man:/
>    
> usr/local/man:/usr/local/mysql/man:/usr/local/samba/man,VNCDESKTOP=ibm041:344
>    (jumbo),HOSTNAME=ibm041,HOST=ibm041,SHELL=/bin/csh,TERM=
>    
> xterm,GROUP=memory,USER=jumbo,LD_LIBRARY_PATH=/usr/lib:/usr/openwin/lib:/usr/dt/lib:/usr/ccs/lib:/usr/local/lib:/usr/local/mysql/lib,L
>    
> S_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.
>    
> exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip
>    
> =00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*
>    
> .xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,HOSTTYPE=x86_64-linux,MAIL=/var/spool/mail/jumbo,PATH=/home/eda/cadence/IC616.500.3_20
>    
> 131102/tools/bin:/home/eda/cadence/IC616.500.3_20131102/tools/dfII/bin:/home/eda/cadence/IC616.500.3_20131102/tools/plot/bin:/home/eda
>    
> /cadence/Spectre161ISR2/tools/bin:/home/sge/sge6.2u6/bin/lx24-amd64:/bin:/usr/bin:/usr/local/bin:.:/home/sge/bin:/home/DI/TOOLS/bin:.:
>    
> /home/IPproj/IOproject/quan/Flatten,INPUTRC=/etc/inputrc,PWD=/home/memorytemp/jumbo/180G_RK/S018DP/design_review,EDITOR=xterm
>    -e vi,LA
>    
> NG=en_US.UTF-8,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=6,HOME=/home/jumbo,OSTYPE=linux,VENDOR=unknown,MACHTYPE=x86_64
>    ,LOGNAME=jumbo,LESSOPEN=|/usr/bin/lesspipe.sh
>    
> %s,DISPLAY=:344.0,G_BROKEN_FILENAMES=1,_=/usr/bin/gnome-session,GTK_RC_FILES=/etc/gtk/gt
>    
> krc:/home/jumbo/.gtkrc-1.2-gnome2,SESSION_MANAGER=local/ibm041:/tmp/.ICE-unix/17118,GNOME_KEYRING_SOCKET=/tmp/keyring-FJMO4E/socket,GN
>    
> OME_DESKTOP_SESSION_ID=Default,DESKTOP_STARTUP_ID=NONE,COLORTERM=gnome-terminal,WINDOWID=38263354,SGE_ROOT=/home/sge/sge6.2u6,SGE_CELL
>    
> =cell1,SGE_CLUSTER_NAME=p5098,IC61=/home/eda/cadence/IC616.500.3_20131102,MMSIMHOME=/home/eda/cadence/Spectre161ISR2,LM_LICENSE_FILE=5
>    
> 280@ibm041:5280@ibm001:5280@ibm002:5280@ibm003:5260@cadlic:5280@cadlic:5280@dsw3:5280@dsw7:5280@ibm004:5280@ibm005:5280@ibm006:5280@10
>    .224.172.252
>
>    script_file:                ./run.pl
>
>    scheduling info:            queue instance "gui.q@dsbm05" dropped because
>    it is overloaded: mem_used=269814435839.737854 (no load adju stment) >=
>    200g
>
>                                queue instance "192g.q@dsbm10" dropped because
>    it is temporarily not available
>
>                                queue instance "gui.q@dsbm10" dropped because
>    it is temporarily not available
>
>                                queue instance "gui.q@dsbm10" dropped because
>    it is temporarily not available
>
>
>
>
>
>    And clearly there are available resources:
>
>
>
>
>
>
>
>    # qstat -F mem
>
>    queuename                      qtype resv/used/tot. load_avg arch
>    states
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gmig.q@ibm044                  BIP   0/0/2          1.27     lx24-amd64
>
>            hc:virtual_free=24.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm04                   BIP   0/59/70        10.01    lx24-amd64
>
>            hc:virtual_free=256.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm05                   BIP   0/56/70        7.14     lx24-amd64
>    a
>
>            hc:virtual_free=90.705G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm08                   BIP   0/11/45        9.96     lx24-amd64
>
>            hc:virtual_free=192.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm09                   BIP   0/7/45         9.84     lx24-amd64
>
>            hc:virtual_free=192.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm10                   BIP   0/2/45         0.82     lx24-amd64
>    o
>
>            hc:virtual_free=192.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    gui.q@dsbm11                   BIP   0/41/45        3.13     lx24-amd64
>
>            hc:virtual_free=192.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    lc.q@ibm071                    BIP   0/0/50         0.21     lx24-amd64
>
>            hc:virtual_free=48.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    lc.q@ibm072                    BIP   0/0/50         0.00     lx24-amd64
>
>            hc:virtual_free=48.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    lc.q@ibm073                    BIP   0/0/50         24.09    lx24-amd64
>
>            hc:virtual_free=48.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    lc.q@ibm074                    BIP   0/5/50         0.05     lx24-amd64
>
>            hc:virtual_free=48.000G
>
>
> ----------------------------------------------------------------------
> -----------
>
>    lc.q@ibm075                    BIP   0/0/50         24.43    lx24-amd64
>
>            hc:virtual_free=48.000G
>
>
>
>
>
>    Not sure what happened there. I had to disable this complex, so now jobs
>    are being scheduled again. I wonder if there was one job that was
>    submitted improperly that caused this?
>
>
>
>
>
>
> ----------------------------------------------------------------------
>
>       This email (including its attachments, if any) may be confidential and
>       proprietary information of SMIC, and intended only for the use of the
>       named recipient(s) above. Any unauthorized use or disclosure of this 
> email
>       is strictly prohibited. If you are not the intended recipient(s), please
>       notify the sender immediately and delete this email from your computer.

> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to