Hi,
I don't know if it happens also for you, but I encountered a problem in
reactionner when it processes a Notification (with last git code).
The reactionner's worker that handle the notification crashes with a
traceback :
[0][scheduler-central]Stats : Workers:1 (Queued:0 Processing:0
ReturnWait:0)
[1][scheduler-CMP]Stats : Workers:1 (Queued:0 Processing:0
ReturnWait:0)
Wait ratio: 1.0
Notification instance has no attribute 'timeout'
Ask actions to 1 got 1
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
_bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in
run
self._target(*self._args, **self._kwargs)
File "./shinken/worker.py", line 207, in work
self.manage_finished_checks()
File "./shinken/worker.py", line 146, in manage_finished_checks
action.check_finished(self.max_plugins_output_length)
File "./shinken/action.py", line 178, in check_finished
self.check_finished_unix(max_plugins_output_length)
File "./shinken/action.py", line 191, in check_finished_unix
if (now - self.check_time) > self.timeout:
AttributeError: Notification instance has no attribute 'timeout'
We ask us for a ping
========================
[reactionner-central] Warning : the worker 0 goes down unexpectly!
[0][scheduler-central]Stats : Workers:0 (Queued:0 Processing:1
ReturnWait:0)
[1][scheduler-CMP]Stats : Workers:0 (Queued:0 Processing:1
ReturnWait:0)
Wait ratio: 1.0
[reactionner-central] Allocating new Worker : 1
After debugging, I found that Notification is correctly created and sent
scheduler-side (in get_checks method), but reactionner receive this
Notification without the 'timeout' attribute (after the Pyro remote call
to get_checks) !
Here is a small patch that worked for me (adding the 'timeout' attribute
to the 'properties' list defined in the Notification class), I don't
know if it's the correct way to correct the problem :
notification.py
93c93
<
---
> 'timeout' : StringProp(default=5),
And a little worker.py patch to add exception catching :
146,147c150,157
<
action.check_finished(self.max_plugins_output_length)
< wait_time = min(wait_time, action.wait_time)
---
> try:
>
action.check_finished(self.max_plugins_output_length)
> wait_time = min(wait_time, action.wait_time)
> except Exception, exp:
> print "[%d]Error!!! %s, exiting." % (self.id,
exp)
> sys.exit(2)
But, after having corrected this first problem, another bug occured (in
reactionner again, when worker returns its result to reactionner),
traceback :
Traceback (most recent call last):
File "/usr/local/shinken/bin/shinken-reactionner", line 5, in
<module>
pkg_resources.run_script('Shinken==0.4', 'shinken-reactionner')
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line
467, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line
1200, in run_script
execfile(script_filename, namespace, namespace)
File
"/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/EGG-INFO/scripts/shinken-reactionner",
line 158, in <module>
p.main()
File
"/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py",
line 708, in main
self.manage_action_return(self.returns_queue.pop())
File
"/usr/local/lib/python2.6/dist-packages/Shinken-0.4-py2.6.egg/shinken/satellite.py",
line 309, in manage_action_return
sched_id = action.sched_id
AttributeError: Notification instance has no attribute 'sched_id'
I found where the problem is, but it's very strange and didn't manage to
solve it.
When reactionner get a Notification from scheduler, it adds the sched_id
attribute to it, and put it in its 'self.s' Queue
(multiprocessing.Queue), ok.
But when the worker dequeue this Notification, the sched_id attribute
have disapeared !
I tried to dequeue the Notification just after it have been queued by
reactionner, and this attribute really disapeared !
Have you any idea ? some race condition ? I'm running Python 2.6.6
(Debian Squeeze)
Laurent
------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web. Learn how to
best implement a security strategy that keeps consumers' information secure
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Shinken-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/shinken-devel