[web2py] Re: scheduler worker assignment and disable behavior

DeanK Thu, 12 Jun 2014 10:37:57 -0700

Thanks for the detailed response! Lot's to cover so here we go......haha


mysql is the uttermost/personal top 1 dislike/non-standard behaving backend 
> out there, but we'll manage ^_^


Interesting.  What do you like more?



and if by "node" you mean a completely different server, you have 7*5 = 35 
> additional workers on top of the 11 on the "head". That's quite a number of 
> workers, I hope they are there because you need to process at least 46 
> tasks in parallel
>

 
So I am *certainly* using the scheduler in a way it wasn't intended, but 
that's part of the fun right? I'm in an interesting situation where I have 
access to a "cluster" of 5 computers that each have 7 GPUs.  There 
currently isn't a proper task scheduler (e.g. SGE, LSF, slurm, etc.) 
installed yet...sot it's not really much of a cluster beyond a shared 
filesystem...but I want to use the system now instead of waiting for 
everything to get setup.  I don't have sudo access....so I thought: 
hey...in less than a day's work I can set up web2py + the built-in 
scheduler + the comfort scheduler monitor and be able to run distributed 
GPU processing with a shiny web2py frontend!  That is why I need 7 "compQ" 
workers per machine (1 per GPU).  It is also why I include a unique group 
name for *each* worker (hostname_compXX). This lets me issue 
terminate/disable commands to a group and be able to stop specific workers. 
 I need this to control which GPUs will pick up work, since I can't use all 
of them all the time.

>From your description, it is still my understanding that after ~30 seconds 
a disabled worker will go back to work.  As you can see from my odd use 
case above, I don't want this to happen.  If i'm disabling a worker it's 
because i don't want that worker to pick up any tasks until it is commanded 
to resume....so I have resorted to terminating and manually restarting. 
 This works for now.
 


Howdy....4 minutes to 4 hours!!!! ok, we are flexible but hey, 4 hours 
> isn't a task, it's a nightmare
>

Haha...again...obviously stretching things here, but it is pretty much 
working which is cool.  This more or less makes sense, and I'm definitely 
seeing the impact of a long running task being the TICKER.   When this 
happens nothing moves out of the queued state for a LONG time.  Based on 
what you've said, forcing a new TICKER should make this go away I think. So 
i may need a simple script i can run to clear the worker table when i see 
this happen. This won't re-assign already assigned tasks though, correct? 
 For example I see stuff like this:

2 workers: A and B
4 tasks: 1,2,3,4 - tasks 1 and 2 take 5 minutes, tasks 3 and 4 take 1 hour.

Worker A gets assigned tasks 1 and 2, B gets 3 and 4.  Tasks 1 and 2 finish 
in 10 minutes.  Worker A sits idle while worker B runs for 2 hours.  Is 
this a correct understanding how things work, or if I force the ticker to 
PICK it will actually reassign these tasks to an idle worker?


I don't see why for tasks that take 4 minutes to 4 hours, you should use 
> "immediate". 
>

I totally agree.  It kind of got copy/paste carried over from other code 
for a web app where it did make sense to use immediate.  I'm not doing it 
anymore.  I did go back an check the output from the workers and i do see 
some errors.  There are some application specific things from my code, but 
also two others of this flavor:

2014-05-23 13:18:10,544 - web2py.scheduler.XXXXXX#16361 - ERROR - Error 
cleaning up 

Traceback (most recent call last):
  File "/home/xxxxx/anaconda/lib/python2.7/logging/handlers.py", line 76, in 
emit
    if self.shouldRollover(record):
  File "/home/xxxxx/anaconda/lib/python2.7/logging/handlers.py", line 157, 
in shouldRollover
    self.stream.seek(0, 2)  #due to non-posix-compliant Windows feature
IOError: [Errno 116] Stale file handle
Logged from file scheduler.py, line 822

Note: I do have the web2py logging setup, but i'm not using it for anything 
anymore so i could delete the config file.  It looks like all the output 
from the workers is getting put into the web2py log file. Maybe one worker 
is causing the log file to roll over while another is trying to write to it?


Finally, looking at my notes I've seen some other weird behavior.  I'm not 
sure this is the place for it to go since this post is ridiculously dense 
to begin with, so let me know if you want me to repost it somewhere else

   - If the ticker is a worker who is running a long task, nothing gets 
   assigned for a very long time (I think until the job completes).  I think 
   we've covered this behavior above and it makes sense.  Forcing a new ticker 
   should fix it.
   - Sometimes I see tasks that complete successfully, but get re-run for 
   some reason (i've only seen it with my long running 3-4 hr tasks).  Looking 
   in the comfy monitor, the task has a complete run and i see the output, but 
   it gets scheduled and run again.  Since my code does cleanup after the 
   first run, the input data is missing so the second run fails (which is how 
   i noticed this).  Not sure why this is happening and may need to try to 
   figure out how to reproduce reliably for debugging.
   - I've seen situations where I know a task is running, but things are 
   still listed as assigned.  I know this because i can see how many tasks are 
   physically running on the worker nodes and can compare that to what the 
   scheduler is reporting.  I would assume tasks i know to be running should 
   equal tasks listed as running.



Thanks again for the help and making all this easy, awesome, and free.  

Dean



On Saturday, June 7, 2014 12:35:56 PM UTC-4, Niphlod wrote:
>
> ok, my responses are inline your post
>
> On Friday, June 6, 2014 9:34:00 PM UTC+2, DeanK wrote:
>>
>> I'm have a few things that need clarification and am also experiencing 
>> some odd behavior with the scheduler. I'm using my app's db instance 
>> (mysql) for the scheduler.
>>
>
> mysql is the uttermost/personal top 1 dislike/non-standard behaving 
> backend out there, but we'll manage ^_^
>  
>
>>
>> at the bottom of scheduler.py:
>>
>>
>> from gluon.scheduler import Scheduler
>>
>> scheduler = Scheduler(db,heartbeat=3)
>>
>>
>>
>> I start my workers like this:
>>
>> head node:
>>
>> python web2py.py -K myapp:upload,myapp:upload,myapp:upload,myapp:upload,
>> myapp:upload,myapp:download,myapp:download,myapp:download,myapp:download,
>> myapp:download,myapp:head_monitorQ 
>>
>>
> 5 upload, 5 download, 1 headmonitorQ. 11 workers
>  
>
>> 5 compute nodes:
>>
>> GROUP0="myapp:"$HOSTNAME"_comp_0:compQ"
>> GROUP1="myapp:"$HOSTNAME"_comp_1:compQ"
>> GROUP2="myapp:"$HOSTNAME"_comp_2:compQ"
>> GROUP3="myapp:"$HOSTNAME"_comp_3:compQ"
>> GROUP4="myapp:"$HOSTNAME"_comp_4:compQ"
>> GROUP5="myapp:"$HOSTNAME"_comp_5:compQ"
>> GROUP6="myapp:"$HOSTNAME"_comp_6:compQ"
>> MON="myapp:"$HOSTNAME"_monitorQ"
>>
>> python web2py.py -K 
>> $GROUP0,$GROUP1,$GROUP2,$GROUP3,$GROUP4,$GROUP5,$GROUP6,$MON
>>
>>
>> The head node has 4 "upload" and 4 "download" processes.  Each compute 
>> node has 7 "compQ" processes that do the actual work.  The hostname based 
>> groups are unique so I can remotely manage the workers.  The monitorQ's run 
>> a task every 30s to provide hw monitoring to my application.
>>
>
> and if by "node" you mean a completely different server, you have 7*5 = 35 
> additional workers on top of the 11 on the "head". That's quite a number of 
> workers, I hope they are there because you need to process at least 46 
> tasks in parallel, otherwise, it's just a waste of processes and groups. 
> Don't know about the sentence "hostname based groups are unique so I can 
> remotely manage the workers" because by default scheduler workers names 
> are  "hostname#pid" tagged, so unique by default. On top of that, the 
> default heartbeat of 3 seconds means that even when there are no tasks to 
> process, you have a potential of 46 concurrent processes hitting the 
> database every 3 seconds...is that necessary ?
>  
>
>>
>> 1) I have the need to dynamically enable/disable workers to match 
>> available hardware.  I was hoping to do this with the disable/resume 
>> commands but the behavior isn't what I had hoped (but I think what is 
>> intended).  I would like to send a command that will stop a worker from 
>> getting assigned/picking up jobs until a resume is issued.  From the docs 
>> and experimenting, it looks like all disable does is simply sleep the 
>> worker for a little bit and then it gets right back to work.  To get my 
>> current desired behavior I issue a terminate command, but then i need to 
>> ssh into each compute node and restart workers when i want to scale back 
>> up...which works but is less than ideal.
>>
>> *Is there any way to "toggle" a worker into a disabled state?*
>>
>> funny you say that, I'm actually working on an "autoscaling" management 
> that spawns additional workers (and kills them) when a certain criteria is 
> met to deal with spikes of queued tasks. Let's forget about that for a 
> second, and deal with the current version of the scheduler... there are a 
> few things in your statements that I'd like to "verify"...
> 1) if you set the status of a worker to "DISABLED", it won't die
> 2) once DISABLED, it sleeps progressively until 10 times the heartbeat. 
> This means that once set to DISABLED, it progressively waits more seconds 
> to check with the database for a "resume" command, stopping at ~30 seconds. 
> This means that a DISABLED worker, in addition to NOT being able to receive 
> tasks, will only "touch" the db every 30 seconds at most. It's basically 
> doing nothing, and I don't see a reason why you should kill a DISABLED 
> worker because it doesn't consume any resource. It is ready to resume 
> processing and you won't need to ssh into the server to restart the workers 
> processes.
>
>
>> 2) A previous post from Niphlod explains the worker assignment:
>>
>> A QUEUED task is not picked up by a worker, it is first ASSIGNED to a 
>>> worker that can pick up only the ones ASSIGNED to him. The "assignment" 
>>> phase is important because:
>>> - the group_name parameter is honored (task queued with the group_name 
>>> 'foo' gets assigned only to workers that process 'foo' tasks (the 
>>> group_names column in scheduler_workers))
>>> - DISABLED, KILL and TERMINATE workers are "removed" from the assignment 
>>> alltogether 
>>> - in multiple workers situations the QUEUED tasks are split amongst 
>>> workers evenly, and workers "know in advance" what tasks they are allowed 
>>> to execute (the assignment allows the scheduler to set up n "independant" 
>>> queues for the n ACTIVE workers)
>>
>>
>> This is an issue for me, because my tasks do not have a uniform run time. 
>>  Some jobs can take 4 minutes while some can take 4 hours.  I keep getting 
>> into situations where a node is sitting there with plenty of idle workers 
>> available, but they apparently don't have tasks to pick up.  Another node 
>> is chugging along with a bunch of backlogged assigned tasks.  Also 
>> sometimes a single worker on a node is left with all the assigned tasks 
>> while the other works are sitting idle.
>>
>> *Is there any built-in way to periodically force a reassignment of tasks 
>> to deal with this type if situation?*
>>
>>
> Howdy....4 minutes to 4 hours!!!! ok, we are flexible but hey, 4 hours 
> isn't a task, it's a nightmare. That being said, there's no way that on a 
> short period of time (e.g., 60 seconds) idle workers won't pick up tasks 
> ready to be processed. 
> Long story: only a TICKER process assigns tasks, to avoid concurrency 
> issues, and it assigns tasks roughly every 5 cycles (that is, 15 seconds), 
> unless "immediate" is used when a task gets queued. Consider that as a 
> "meta-task" that only the TICKER does. When a worker is processing a task 
> (i.e. one of the ones that last 4 hours), it's internally marked as 
> "RUNNING" ("instead" of being ACTIVE). When a TICKER is also RUNNING, this 
> means that there could be new tasks ready to be processed, but they won't 
> because the assignment is a "meta-task". There's a specific section of code 
> that deals with this situations and lets the TICKER relinquish its powers 
> to let ACTIVE (not RUNNING) workers pick up the assignment process (lines 
> #944 and following). 
> Finally, to answer your question...if needed you can either:
> - truncate the workers table (in this case, workers will simply re-insert 
> their record and elect a TICKER)
> - set the TICKER status to "PICK". This will only force a reassignment in 
> at most 3 seconds vs waiting the usual 15 seconds
>
>
> 3) I had been using "immediate=True" on all of my tasks.  I started to see 
>> db deadlock errors occasionally when scheduling jobs using queue_task(). 
>>  Removing "immediate=True" seemed to fix this problem.
>>
>> *Is there any reason why immediate could be causing deadlocks?*
>>
>
> I don't see why for tasks that take 4 minutes to 4 hours, you should use 
> "immediate". 
> Immediate just sets the TICKER status to "PICK" in order to assign task on 
> the next round, instead waiting the usual 5 "loops". 
> This means that immediate can, and should, be used for very (very) fast 
> executing tasks that needs a result within LESS than 15 seconds, that is 
> the WORST scenario that can happen, i.e. the task gets queued the instant 
> after an assignment round happened.
> Let's get the "general" picture here, because I see many users getting a 
> wrong idea... web2py's scheduler is fast, but it's not meant to process 
> millions of tasks distributed on hundreds of workers (there are far better 
> tools for that job). If you feel the need to use "immediate", it's because 
> you queued a task that needs to return a result fast. Here "fast" means 
> that there is a noticeable change between the time you queue a task and the 
> time you get the result back using "immediate" vs not using it. 
> Given that "immediate" allows to "gain", on average, 8 seconds, in my POV 
> it should only be used with tasks whose execution time is less than 20-30 
> seconds. For anything higher, you're basically gaining less than the 20%. 
> For less than 20 seconds, if other limitations are not around, you'd better 
> process the task within the webserver, e.g. via ajax, or look at celery 
> (good luck :D)
> To answer the "deadlock" question, if you see the code, all that 
> "immediate" does is an additional update on the status of the TICKER. 
>
> This makes a ring bell because - also if "immediate" is not needed in my 
> POV as explained before - points out that your backend can't sustain the db 
> pressure of 46 workers. Do you see any "ERROR" lines in the log of the 
> workers ?
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: scheduler worker assignment and disable behavior

Reply via email to