the machine is plenty big... another note, when we use mysql on a weaker machine the deadlocks go away, so i feel that this must be something related to MSSQL
On Tuesday, August 30, 2016 at 11:48:42 AM UTC-4, Niphlod wrote: > > when the backend has orrible performances :D > 12 workers with the default heartbeat are easily taken care by a dual core > 4GB RAM backend (without anything beefy on top of that). > > On Tuesday, August 30, 2016 at 5:41:01 PM UTC+2, Jason Solack wrote: >> >> So after more investigation we are seeing that our load balanced server >> with processes runnin on all three machines are causing a lot of deadlocks >> in MSSQL. Have you seen that before? >> >> On Friday, August 19, 2016 at 2:40:35 AM UTC-4, Niphlod wrote: >>> >>> yep. your worker setup clearly can't stably be connected to your backend. >>> >>> On Thursday, August 18, 2016 at 7:41:38 PM UTC+2, Jason Solack wrote: >>>> >>>> so after some digging what i'm seeing is the sw.insert(...) is not >>>> committing and the mybackedstatus is None, this happens 5 times and then >>>> the worker appears and almost instantly disappers. There are no errors. >>>> i >>>> tried manually doing a db.executesql but i'm having trouble getting >>>> self.w_stats converted to something i can insert via sql. >>>> >>>> another things i'm noticing is my "distribution" in w_stats is None... >>>> >>>> Any ideas as to why this is happening? >>>> >>>> On Thursday, August 18, 2016 at 12:21:26 PM UTC-4, Jason Solack wrote: >>>>> >>>>> doing that now, what i'm seeing is some problems here: >>>>> >>>>> # record heartbeat >>>>> mybackedstatus = db(sw.worker_name == self >>>>> .worker_name).select().first() >>>>> if not mybackedstatus: >>>>> sw.insert(status=ACTIVE, worker_name=self.worker_name, >>>>> first_heartbeat=now, last_heartbeat=now, >>>>> group_names=self.group_names, >>>>> worker_stats=self.w_stats) >>>>> self.w_stats.status = ACTIVE >>>>> self.w_stats.sleep = self.heartbeat >>>>> mybackedstatus = ACTIVE >>>>> >>>>> mybackedstatus is consistently coming back as "None" i'm guessing >>>>> there is an error somewhere in that try block and the db commit is being >>>>> rolled back >>>>> >>>>> i'm using MSSQL and nginx... currently upgrading web2py to see it >>>>> continues >>>>> >>>>> >>>>> >>>>> On Thursday, August 18, 2016 at 10:44:28 AM UTC-4, Niphlod wrote: >>>>>> >>>>>> turn on workers debugging level and grep for errors. >>>>>> >>>>>> On Thursday, August 18, 2016 at 4:38:31 PM UTC+2, Jason Solack wrote: >>>>>>> >>>>>>> I think we have this scenario happening: >>>>>>> >>>>>>> >>>>>>> https://groups.google.com/forum/#%21searchin/web2py/task_id%7csort:relevance/web2py/AYH5IzCIEMo/hY6aNplbGX8J >>>>>>> >>>>>>> our workers seems to be restarting quickly and we're trying to >>>>>>> figure out why >>>>>>> >>>>>>> On Thursday, August 18, 2016 at 3:55:55 AM UTC-4, Niphlod wrote: >>>>>>>> >>>>>>>> small recap.......a single worker is tasked with assigning tasks >>>>>>>> (the one with is_ticker=True) and then that task is picked up only by >>>>>>>> the >>>>>>>> assigned worker (you can see it on the >>>>>>>> scheduler_task.assigned_worker_name >>>>>>>> column of the task). >>>>>>>> There's no way the same task (i.e. a scheduler_task "row") is >>>>>>>> executed while it is RUNNING (i.e. processed by some worker). >>>>>>>> The process running the task is stored also in >>>>>>>> scheduler_run.worker_name. >>>>>>>> >>>>>>>> <tl;dr> you shouldn't EVER have scheduler_run records with the same >>>>>>>> task_id and 12 different worker_name all in the RUNNING status. >>>>>>>> >>>>>>>> For a single task to be processed by ALL 12 workers at the same >>>>>>>> time... is quite impossible, if everything is running smoothly. And >>>>>>>> frankly >>>>>>>> I can't fathom any scenario in which it is possible. >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, August 17, 2016 at 6:25:41 PM UTC+2, Jason Solack >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I only see the task_id in the scheduler_run table, it seems to be >>>>>>>>> added as many times as it can while the run is going... a short run >>>>>>>>> will >>>>>>>>> add just 2 of the workers and stop adding them once the initial run >>>>>>>>> is >>>>>>>>> completed >>>>>>>>> >>>>>>>>> On Wednesday, August 17, 2016 at 11:15:52 AM UTC-4, Niphlod wrote: >>>>>>>>>> >>>>>>>>>> task assignment is quite "beefy" (sadly, or fortunately in your >>>>>>>>>> case, it favours consistence vs speed) : I don't see any reason why >>>>>>>>>> a >>>>>>>>>> single task gets picked up by ALL of the 12 workers at the same time >>>>>>>>>> if the >>>>>>>>>> backend isn't lying (i.e. slaves not replicating master data),.... >>>>>>>>>> if your >>>>>>>>>> mssql is "single", there shouldn't absolutely be those kind of >>>>>>>>>> problems... >>>>>>>>>> >>>>>>>>>> Are you sure all are crunching the same exact task (i.e. same >>>>>>>>>> task id and uuid) ? >>>>>>>>>> >>>>>>>>>> On Wednesday, August 17, 2016 at 2:47:11 PM UTC+2, Jason Solack >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I'm using nginx and MSSQL for the db >>>>>>>>>>> >>>>>>>>>>> On Wednesday, August 17, 2016 at 3:11:11 AM UTC-4, Niphlod wrote: >>>>>>>>>>>> >>>>>>>>>>>> nothing in particular. what backend are you using ? >>>>>>>>>>>> >>>>>>>>>>>> On Tuesday, August 16, 2016 at 8:35:17 PM UTC+2, Jason Solack >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> task = scheduler.queue_task(tab_run, >>>>>>>>>>>>> pvars=dict(tab_file_name=tab_file_name, >>>>>>>>>>>>> the_form_file=the_form_file), >>>>>>>>>>>>> timeout=60 * 60 * 24, sync_output=2, immediate=False, >>>>>>>>>>>>> group_name=scheduler_group_name) >>>>>>>>>>>>> >>>>>>>>>>>>> anything look amiss here? >>>>>>>>>>>>> >>>>>>>>>>>>> On Tuesday, August 16, 2016 at 2:14:38 PM UTC-4, Dave S wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 9:38:09 AM UTC-7, Jason Solack >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello all, i am having a situation where my scheduled jobs >>>>>>>>>>>>>>> are being picked up by multiple workers. My last task was >>>>>>>>>>>>>>> picked up by all >>>>>>>>>>>>>>> 12 workers and is crushing the machines. This is a load >>>>>>>>>>>>>>> balanced machine >>>>>>>>>>>>>>> with 3 machine and 4 workers on each machine. has anyone >>>>>>>>>>>>>>> experienced >>>>>>>>>>>>>>> something like this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for your help in advance! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> jason >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> What does your queue_task() code look like? >>>>>>>>>>>>>> >>>>>>>>>>>>>> /dps >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.