[web2py] Re: scheduler task_id assigned to multiple workers

Jason Solack Tue, 30 Aug 2016 09:09:47 -0700

the machine is plenty big... another note, when we use mysql on a weaker 
machine the deadlocks go away, so i feel that this must be something 
related to MSSQL


On Tuesday, August 30, 2016 at 11:48:42 AM UTC-4, Niphlod wrote:
>
> when the backend has orrible performances :D
> 12 workers with the default heartbeat are easily taken care by a dual core 
> 4GB RAM backend (without anything beefy on top of that).
>
> On Tuesday, August 30, 2016 at 5:41:01 PM UTC+2, Jason Solack wrote:
>>
>> So after more investigation we are seeing that our load balanced server 
>> with processes runnin on all three machines are causing a lot of deadlocks 
>> in MSSQL. Have you seen that before?
>>
>> On Friday, August 19, 2016 at 2:40:35 AM UTC-4, Niphlod wrote:
>>>
>>> yep. your worker setup clearly can't stably be connected to your backend.
>>>
>>> On Thursday, August 18, 2016 at 7:41:38 PM UTC+2, Jason Solack wrote:
>>>>
>>>> so after some digging what i'm seeing is the sw.insert(...) is not 
>>>> committing and the mybackedstatus is None, this happens 5 times and then 
>>>> the worker appears and almost instantly disappers.  There are no errors.  
>>>> i 
>>>> tried manually doing a db.executesql but i'm having trouble getting 
>>>> self.w_stats converted to something i can insert via sql.
>>>>
>>>> another things i'm noticing is my "distribution" in w_stats is None...
>>>>
>>>> Any ideas as to why this is happening?
>>>>
>>>> On Thursday, August 18, 2016 at 12:21:26 PM UTC-4, Jason Solack wrote:
>>>>>
>>>>> doing that now, what i'm seeing is some problems here:
>>>>>
>>>>>             # record heartbeat
>>>>>            mybackedstatus = db(sw.worker_name == self
>>>>> .worker_name).select().first()
>>>>>            if not mybackedstatus:
>>>>>                sw.insert(status=ACTIVE, worker_name=self.worker_name,
>>>>>                          first_heartbeat=now, last_heartbeat=now,
>>>>>                          group_names=self.group_names,
>>>>>                          worker_stats=self.w_stats)
>>>>>                self.w_stats.status = ACTIVE
>>>>>                self.w_stats.sleep = self.heartbeat
>>>>>                mybackedstatus = ACTIVE
>>>>>
>>>>> mybackedstatus is consistently coming back as "None" i'm guessing 
>>>>> there is an error somewhere in that try block and the db commit is being 
>>>>> rolled back
>>>>>
>>>>> i'm using MSSQL and nginx... currently upgrading web2py to see it 
>>>>> continues
>>>>>
>>>>>
>>>>>
>>>>> On Thursday, August 18, 2016 at 10:44:28 AM UTC-4, Niphlod wrote:
>>>>>>
>>>>>> turn on workers debugging level and grep for errors.
>>>>>>
>>>>>> On Thursday, August 18, 2016 at 4:38:31 PM UTC+2, Jason Solack wrote:
>>>>>>>
>>>>>>> I think we have this scenario happening:
>>>>>>>
>>>>>>>
>>>>>>> https://groups.google.com/forum/#%21searchin/web2py/task_id%7csort:relevance/web2py/AYH5IzCIEMo/hY6aNplbGX8J
>>>>>>>
>>>>>>> our workers seems to be restarting quickly and we're trying to 
>>>>>>> figure out why
>>>>>>>
>>>>>>> On Thursday, August 18, 2016 at 3:55:55 AM UTC-4, Niphlod wrote:
>>>>>>>>
>>>>>>>> small recap.......a single worker is tasked with assigning tasks 
>>>>>>>> (the one with is_ticker=True) and then that task is picked up only by 
>>>>>>>> the 
>>>>>>>> assigned worker (you can see it on the 
>>>>>>>> scheduler_task.assigned_worker_name 
>>>>>>>> column of the task). 
>>>>>>>> There's no way the same task (i.e. a scheduler_task "row") is 
>>>>>>>> executed while it is RUNNING (i.e. processed by some worker).
>>>>>>>> The process running the task is stored also in 
>>>>>>>> scheduler_run.worker_name.
>>>>>>>>
>>>>>>>> <tl;dr> you shouldn't EVER have scheduler_run records with the same 
>>>>>>>> task_id and 12 different worker_name all in the RUNNING status.
>>>>>>>>
>>>>>>>> For a single task to be processed by ALL 12 workers at the same 
>>>>>>>> time... is quite impossible, if everything is running smoothly. And 
>>>>>>>> frankly 
>>>>>>>> I can't fathom any scenario in which it is possible.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wednesday, August 17, 2016 at 6:25:41 PM UTC+2, Jason Solack 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I only see the task_id in the scheduler_run table, it seems to be 
>>>>>>>>> added as many times as it can while the run is going... a short run 
>>>>>>>>> will 
>>>>>>>>> add just 2 of the workers and stop adding them once the initial run 
>>>>>>>>> is 
>>>>>>>>> completed
>>>>>>>>>
>>>>>>>>> On Wednesday, August 17, 2016 at 11:15:52 AM UTC-4, Niphlod wrote:
>>>>>>>>>>
>>>>>>>>>> task assignment is quite "beefy" (sadly, or fortunately in your 
>>>>>>>>>> case, it favours consistence vs speed) : I don't see any reason why 
>>>>>>>>>> a 
>>>>>>>>>> single task gets picked up by ALL of the 12 workers at the same time 
>>>>>>>>>> if the 
>>>>>>>>>> backend isn't lying (i.e. slaves not replicating master data),.... 
>>>>>>>>>> if your 
>>>>>>>>>> mssql is "single", there shouldn't absolutely be those kind of 
>>>>>>>>>> problems...
>>>>>>>>>>
>>>>>>>>>> Are you sure all are crunching the same exact task (i.e. same 
>>>>>>>>>> task id and uuid) ?
>>>>>>>>>>
>>>>>>>>>> On Wednesday, August 17, 2016 at 2:47:11 PM UTC+2, Jason Solack 
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I'm using nginx and MSSQL for the db
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, August 17, 2016 at 3:11:11 AM UTC-4, Niphlod wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> nothing in particular. what backend are you using ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tuesday, August 16, 2016 at 8:35:17 PM UTC+2, Jason Solack 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         task = scheduler.queue_task(tab_run, 
>>>>>>>>>>>>> pvars=dict(tab_file_name=tab_file_name, 
>>>>>>>>>>>>> the_form_file=the_form_file), 
>>>>>>>>>>>>> timeout=60 * 60 * 24, sync_output=2, immediate=False, 
>>>>>>>>>>>>> group_name=scheduler_group_name)
>>>>>>>>>>>>>
>>>>>>>>>>>>> anything look amiss here?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 2:14:38 PM UTC-4, Dave S wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 9:38:09 AM UTC-7, Jason Solack 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello all, i am having a situation where my scheduled jobs 
>>>>>>>>>>>>>>> are being picked up by multiple workers.  My last task was 
>>>>>>>>>>>>>>> picked up by all 
>>>>>>>>>>>>>>> 12 workers and is crushing the machines.  This is a load 
>>>>>>>>>>>>>>> balanced machine 
>>>>>>>>>>>>>>> with 3 machine and 4 workers on each machine.  has anyone 
>>>>>>>>>>>>>>> experienced 
>>>>>>>>>>>>>>> something like this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help in advance!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> jason
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What does your queue_task() code look like?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /dps
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: scheduler task_id assigned to multiple workers

Reply via email to