On Fri, Jan 7, 2011 at 5:01 PM, Laurent Guyon <laurent.gu...@adelux.fr>wrote:

> Hi,
>
Hi,


>
> I don't know if it happens also for you, but I encountered a problem in
> reactionner when it processes a Notification (with last git code).
>
>
> The reactionner's worker that handle the notification crashes with a
> traceback :
>
>    [0][scheduler-central]Stats : Workers:1 (Queued:0 Processing:0
> ReturnWait:0)
>    [1][scheduler-CMP]Stats : Workers:1 (Queued:0 Processing:0
> ReturnWait:0)
>    Wait ratio: 1.0
>    Notification instance has no attribute 'timeout'
>    Ask actions to 1 got 1
>    Process Process-2:
>    Traceback (most recent call last):
>      File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
> _bootstrap
>        self.run()
>      File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in
> run
>        self._target(*self._args, **self._kwargs)
>      File "./shinken/worker.py", line 207, in work
>        self.manage_finished_checks()
>      File "./shinken/worker.py", line 146, in manage_finished_checks
>        action.check_finished(self.max_plugins_output_length)
>      File "./shinken/action.py", line 178, in check_finished
>        self.check_finished_unix(max_plugins_output_length)
>      File "./shinken/action.py", line 191, in check_finished_unix
>        if (now - self.check_time) > self.timeout:
>    AttributeError: Notification instance has no attribute 'timeout'
>    We ask us for a ping
>     ========================
>    [reactionner-central] Warning : the worker 0 goes down unexpectly!
>    [0][scheduler-central]Stats : Workers:0 (Queued:0 Processing:1
> ReturnWait:0)
>    [1][scheduler-CMP]Stats : Workers:0 (Queued:0 Processing:1
> ReturnWait:0)
>    Wait ratio: 1.0
>    [reactionner-central] Allocating new Worker : 1
>
>
> After debugging, I found that Notification is correctly created and sent
> scheduler-side (in get_checks method), but reactionner receive this
> Notification without the 'timeout' attribute (after the Pyro remote call
> to get_checks) !
>
>
> Here is a small patch that worked for me (adding the 'timeout' attribute
> to the 'properties' list defined in the Notification class), I don't
> know if it's the correct way to correct the problem :
>
>  notification.py
>
>    93c93
>    <
>    ---
>    >         'timeout' : StringProp(default=5),
>

Outch! bad bug!

>
>
> And a little worker.py patch to add exception catching :
>
>    146,147c150,157
>    <
> action.check_finished(self.max_plugins_output_length)
>    <                 wait_time = min(wait_time, action.wait_time)
>    ---
>    >                 try:
>    >
> action.check_finished(self.max_plugins_output_length)
>    >                     wait_time = min(wait_time, action.wait_time)
>    >                 except Exception, exp:
>    >                     print "[%d]Error!!! %s, exiting." % (self.id,
> exp)
>    >                     sys.exit(2)
>
>
> But, after having corrected this first problem, another bug occured (in
> reactionner again, when worker returns its result to reactionner),
> traceback :
>
>    Traceback (most recent call last):
>      File "/usr/local/shinken/bin/shinken-reactionner", line 5, in
> <module>
>        pkg_resources.run_script('Shinken==0.4', 'shinken-reactionner')
>      File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line
> 467, in run_script
>        self.require(requires)[0].run_script(script_name, ns)
>      File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line
> 1200, in run_script
>        execfile(script_filename, namespace, namespace)
>      File
> "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/EGG-INFO/scripts/shinken-reactionner",
> line 158, in <module>
>        p.main()
>      File
> "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py",
> line 708, in main
>        self.manage_action_return(self.returns_queue.pop())
>      File
> "/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py",
> line 309, in manage_action_return
>        sched_id = action.sched_id
>    AttributeError: Notification instance has no attribute 'sched_id'
>
>
> I found where the problem is, but it's very strange and didn't manage to
> solve it.
>
> When reactionner get a Notification from scheduler, it adds the sched_id
> attribute to it, and put it in its 'self.s' Queue
> (multiprocessing.Queue), ok.
> But when the worker dequeue this Notification, the sched_id attribute
> have disapeared !
> I tried to dequeue the Notification just after it have been queued by
> reactionner, and this attribute really disapeared !
>
I think I found what make this : the sched_id is missing in the properties
dict of Notification.

Add this :

'sched_id' : IntegerProp(default=0),

And it should be ok I think. I'll patch it.


Jean


>
> Have you any idea ? some race condition ? I'm running Python 2.6.6
> (Debian Squeeze)
>
> Laurent
>
>
>
>
> ------------------------------------------------------------------------------
> Gaining the trust of online customers is vital for the success of any
> company
> that requires sensitive data to be transmitted over the Web.   Learn how to
> best implement a security strategy that keeps consumers' information secure
> and instills the confidence they need to proceed with transactions.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to