On Fri, Jan 7, 2011 at 5:01 PM, Laurent Guyon <laurent.gu...@adelux.fr>wrote:
> Hi, > Hi, > > I don't know if it happens also for you, but I encountered a problem in > reactionner when it processes a Notification (with last git code). > > > The reactionner's worker that handle the notification crashes with a > traceback : > > [0][scheduler-central]Stats : Workers:1 (Queued:0 Processing:0 > ReturnWait:0) > [1][scheduler-CMP]Stats : Workers:1 (Queued:0 Processing:0 > ReturnWait:0) > Wait ratio: 1.0 > Notification instance has no attribute 'timeout' > Ask actions to 1 got 1 > Process Process-2: > Traceback (most recent call last): > File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in > _bootstrap > self.run() > File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in > run > self._target(*self._args, **self._kwargs) > File "./shinken/worker.py", line 207, in work > self.manage_finished_checks() > File "./shinken/worker.py", line 146, in manage_finished_checks > action.check_finished(self.max_plugins_output_length) > File "./shinken/action.py", line 178, in check_finished > self.check_finished_unix(max_plugins_output_length) > File "./shinken/action.py", line 191, in check_finished_unix > if (now - self.check_time) > self.timeout: > AttributeError: Notification instance has no attribute 'timeout' > We ask us for a ping > ======================== > [reactionner-central] Warning : the worker 0 goes down unexpectly! > [0][scheduler-central]Stats : Workers:0 (Queued:0 Processing:1 > ReturnWait:0) > [1][scheduler-CMP]Stats : Workers:0 (Queued:0 Processing:1 > ReturnWait:0) > Wait ratio: 1.0 > [reactionner-central] Allocating new Worker : 1 > > > After debugging, I found that Notification is correctly created and sent > scheduler-side (in get_checks method), but reactionner receive this > Notification without the 'timeout' attribute (after the Pyro remote call > to get_checks) ! > > > Here is a small patch that worked for me (adding the 'timeout' attribute > to the 'properties' list defined in the Notification class), I don't > know if it's the correct way to correct the problem : > > notification.py > > 93c93 > < > --- > > 'timeout' : StringProp(default=5), > Outch! bad bug! > > > And a little worker.py patch to add exception catching : > > 146,147c150,157 > < > action.check_finished(self.max_plugins_output_length) > < wait_time = min(wait_time, action.wait_time) > --- > > try: > > > action.check_finished(self.max_plugins_output_length) > > wait_time = min(wait_time, action.wait_time) > > except Exception, exp: > > print "[%d]Error!!! %s, exiting." % (self.id, > exp) > > sys.exit(2) > > > But, after having corrected this first problem, another bug occured (in > reactionner again, when worker returns its result to reactionner), > traceback : > > Traceback (most recent call last): > File "/usr/local/shinken/bin/shinken-reactionner", line 5, in > <module> > pkg_resources.run_script('Shinken==0.4', 'shinken-reactionner') > File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line > 467, in run_script > self.require(requires)[0].run_script(script_name, ns) > File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line > 1200, in run_script > execfile(script_filename, namespace, namespace) > File > "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/EGG-INFO/scripts/shinken-reactionner", > line 158, in <module> > p.main() > File > "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py", > line 708, in main > self.manage_action_return(self.returns_queue.pop()) > File > "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py", > line 309, in manage_action_return > sched_id = action.sched_id > AttributeError: Notification instance has no attribute 'sched_id' > > > I found where the problem is, but it's very strange and didn't manage to > solve it. > > When reactionner get a Notification from scheduler, it adds the sched_id > attribute to it, and put it in its 'self.s' Queue > (multiprocessing.Queue), ok. > But when the worker dequeue this Notification, the sched_id attribute > have disapeared ! > I tried to dequeue the Notification just after it have been queued by > reactionner, and this attribute really disapeared ! > I think I found what make this : the sched_id is missing in the properties dict of Notification. Add this : 'sched_id' : IntegerProp(default=0), And it should be ok I think. I'll patch it. Jean > > Have you any idea ? some race condition ? I'm running Python 2.6.6 > (Debian Squeeze) > > Laurent > > > > > ------------------------------------------------------------------------------ > Gaining the trust of online customers is vital for the success of any > company > that requires sensitive data to be transmitted over the Web. Learn how to > best implement a security strategy that keeps consumers' information secure > and instills the confidence they need to proceed with transactions. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Shinken-devel mailing list > Shinken-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/shinken-devel >
------------------------------------------------------------------------------ Gaining the trust of online customers is vital for the success of any company that requires sensitive data to be transmitted over the Web. Learn how to best implement a security strategy that keeps consumers' information secure and instills the confidence they need to proceed with transactions. http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel