Regarding #2, we use https://slurm.schedmd.com/fair_tree.html and it has really 
exceeded my expectations.

Pete Ruprecht
CU-Boulder Research Computing

On 8/24/17, 11:03 AM, "Patrick Goetz" <pgo...@math.utexas.edu> wrote:

    
    I'm managing what has turned into a very busy cluster and the users are 
    complaining about resource allocation.
    
    There appear to be 2 main complaints:
    
    1. When users submit (say) 8 long running single core jobs, it doesn't 
    appear that Slurm attempts to consolidate them on a single node (each of 
    our nodes can accommodate 16 tasks).  This becomes problematic for users 
    submitting threaded applications which must run on a single node, as 
    even short jobs can be locked out of every node on the system for days 
    at a time.
    
    2. Users are interested in temporally influenced resource allocation. 
    Quoting one of the users "heavy users in recent past will get fewer 
    resources assigned to them, while users that rarely use the cluster will 
    likely get more jobs run. We believe a good system will track each user 
    base on their usage"
    
    Basically they want to keep track of
    
    User_usage = Sum (Number of cores requested x Running time)
    
    accumulated over, say, a month's time before the counter is reset.  I'm 
    not really sure how to implement such a thing, or if it is even possible.
    
    
    It's quite possible that a solution to #1 would effectively solve #2 for 
    them, as #1 is the source of the request for #2.
    
    Thanks.
    
    

Reply via email to