#209: daemon:  should open listening port BEFORE going to daemon mode (double
fork + setsid)
----------------------+-----------------------------------------------------
 Reporter:  leblutch  |       Owner:                          
     Type:  defect    |      Status:  new                     
 Priority:  major     |   Milestone:  0.5 (Eruptive Earthworm)
Component:  General   |     Version:  0.5 (Eruptive Earthworm)
 Keywords:            |  
----------------------+-----------------------------------------------------
 Actually all the daemons (arbiter, schedulers, brokers...), when started,
 are going in daemon mode *before* opening their listening port (if any).

 This makes the correct detection of failure to open the port impossible in
 init scripts.

 The fix is quite simple:

 do create_daemon() (should be renamed daemonize() imho) AFTER
 pyro.init_daemon(..) call.
 (or pyro.init_daemon must be made BEFORE create_daemon)

 (side effect: in create_daemon you have to skip the close of the fd
 corresponding to the pyro daemon listening socket)

 Now arbiter, scheduler and broker correctly handles this case.


 But I'm now facing very weird exception with reactionner & poller  (and I
 see that reactionner & poller are using the __init__ & "main" function
 from Satellite (while arbiter+scheduler+broker override these methods in
 their respective "bin" script).

 2 problems in fact (they are certainly related I guess):

 1) The init script doesn't return anymore from the call to
 shinken-{reactionner,poller} (with -d (daemonize)).

 2) The reactionner & poller get this exception nearly directly (arbiter &
 schedulers must be already running) :


 {{{
 Using working directory : /home/greg/Documents/Projets/shinken/var
 Opening port: 7771
 Waiting for initial configuration
 We ask us for a ping
 [poller-1] Sending us a configuration {'arbiters': {}, 'global':
 {'poller_name': 'poller-1', 'max_workers': 4, 'poller_tags': [],
 'modules': [], 'use_timezone': 'NOTSET', 'polling_interval': 1,
 'max_plugins_output_length': 8192, 'min_workers': 4,
 'processes_by_worker': 256}, 'schedulers': {0: {'instance_id': 0,
 'active': True, 'address': 'localhost', 'port': 7768, 'name':
 'scheduler-1'}}}
 DBG: scheduler UIR: PYROLOC://localhost:7768/Checks
 [poller-1] Init de connexion with scheduler-1 at
 PYROLOC://localhost:7768/Checks
 [poller-1] Connexion OK with scheduler scheduler-1
 Max output lenght 8192
 We have our schedulers : {0: {'wait_homerun': {}, 'name': 'scheduler-1',
 'uri': 'PYROLOC://localhost:7768/Checks', 'instance_id': 0, 'running_id':
 0.6091505580666029, 'address': 'localhost', 'active': True, 'port': 7768,
 'con': <DynamicProxy for PYROLOC://127.0.0.1:7768/Checks>}}
 Init main
 [poller-1] Init de connexion with scheduler-1 at
 PYROLOC://localhost:7768/Checks
 [poller-1] Connexion OK with scheduler scheduler-1
 [poller-1] Allocating new Worker : 0
 Traceback (most recent call last):
   File "/home/greg/Documents/Projets/shinken/bin/shinken-poller", line
 160, in <module>
     p.main()
   File "./shinken/satellite.py", line 657, in main
     self.create_and_launch_worker() #create mortal worker
   File "./shinken/satellite.py", line 467, in create_and_launch_worker
     self.workers[w.id].start()
   File "./shinken/worker.py", line 63, in start
     self._process.start()
   File "/usr/lib/python2.6/multiprocessing/process.py", line 99, in start
     _cleanup()
   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
 _cleanup
     if p._popen.poll() is not None:
   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
     pid, sts = os.waitpid(self.pid, flag)
 OSError: [Errno 10] No child processes

 Traceback (most recent call last):
   File "/usr/lib/python2.6/multiprocessing/util.py", line 235, in
 _run_finalizers
     finalizer()
   File "/usr/lib/python2.6/multiprocessing/util.py", line 174, in __call__
     res = self._callback(*self._args, **self._kwargs)
   File "/usr/lib/python2.6/multiprocessing/managers.py", line 576, in
 _finalize_manager
     if process.is_alive():
   File "/usr/lib/python2.6/multiprocessing/process.py", line 129, in
 is_alive
     assert self._parent_pid == os.getpid(), 'can only test a child
 process'
 AssertionError: can only test a child process

 Error in atexit._run_exitfuncs:
 Traceback (most recent call last):
   File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in
 _exit_function
     for p in active_children():
   File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in
 active_children
     _cleanup()
   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
 _cleanup
     if p._popen.poll() is not None:
   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
     pid, sts = os.waitpid(self.pid, flag)
 OSError: [Errno 10] No child processes

 Error in sys.exitfunc:
 Traceback (most recent call last):
   File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in
 _exit_function
     for p in active_children():
   File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in
 active_children
     _cleanup()
   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
 _cleanup
     if p._popen.poll() is not None:
   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
     pid, sts = os.waitpid(self.pid, flag)
 OSError: [Errno 10] No child processes
 }}}



 I googled around and see quite lot of ref to theses errors/traceback but I
 actually can't find any real solution ..


 I could workaround 1) by changing os._exit to sys.exit call in
 create_daemon() of daemon.py  but the crash in 2) remains.

-- 
Ticket URL: <http://sourceforge.net/apps/trac/shinken/ticket/209>
shinken <http://sourceforge.net/projects/shinken/>
Shinken is a Linux/Windows compatible Nagios reimplementation in Python. The 
main goal of the program is to allows users to scale the load : it "cuts" the 
user's configuration into independent part and send it to workers.
------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to