#209: daemon: should open listening port BEFORE going to daemon mode (double fork + setsid) ----------------------+----------------------------------------------------- Reporter: leblutch | Owner: Type: defect | Status: new Priority: major | Milestone: 0.5 (Eruptive Earthworm) Component: General | Version: 0.5 (Eruptive Earthworm) Keywords: | ----------------------+----------------------------------------------------- Actually all the daemons (arbiter, schedulers, brokers...), when started, are going in daemon mode *before* opening their listening port (if any).
This makes the correct detection of failure to open the port impossible in init scripts. The fix is quite simple: do create_daemon() (should be renamed daemonize() imho) AFTER pyro.init_daemon(..) call. (or pyro.init_daemon must be made BEFORE create_daemon) (side effect: in create_daemon you have to skip the close of the fd corresponding to the pyro daemon listening socket) Now arbiter, scheduler and broker correctly handles this case. But I'm now facing very weird exception with reactionner & poller (and I see that reactionner & poller are using the __init__ & "main" function from Satellite (while arbiter+scheduler+broker override these methods in their respective "bin" script). 2 problems in fact (they are certainly related I guess): 1) The init script doesn't return anymore from the call to shinken-{reactionner,poller} (with -d (daemonize)). 2) The reactionner & poller get this exception nearly directly (arbiter & schedulers must be already running) : {{{ Using working directory : /home/greg/Documents/Projets/shinken/var Opening port: 7771 Waiting for initial configuration We ask us for a ping [poller-1] Sending us a configuration {'arbiters': {}, 'global': {'poller_name': 'poller-1', 'max_workers': 4, 'poller_tags': [], 'modules': [], 'use_timezone': 'NOTSET', 'polling_interval': 1, 'max_plugins_output_length': 8192, 'min_workers': 4, 'processes_by_worker': 256}, 'schedulers': {0: {'instance_id': 0, 'active': True, 'address': 'localhost', 'port': 7768, 'name': 'scheduler-1'}}} DBG: scheduler UIR: PYROLOC://localhost:7768/Checks [poller-1] Init de connexion with scheduler-1 at PYROLOC://localhost:7768/Checks [poller-1] Connexion OK with scheduler scheduler-1 Max output lenght 8192 We have our schedulers : {0: {'wait_homerun': {}, 'name': 'scheduler-1', 'uri': 'PYROLOC://localhost:7768/Checks', 'instance_id': 0, 'running_id': 0.6091505580666029, 'address': 'localhost', 'active': True, 'port': 7768, 'con': <DynamicProxy for PYROLOC://127.0.0.1:7768/Checks>}} Init main [poller-1] Init de connexion with scheduler-1 at PYROLOC://localhost:7768/Checks [poller-1] Connexion OK with scheduler scheduler-1 [poller-1] Allocating new Worker : 0 Traceback (most recent call last): File "/home/greg/Documents/Projets/shinken/bin/shinken-poller", line 160, in <module> p.main() File "./shinken/satellite.py", line 657, in main self.create_and_launch_worker() #create mortal worker File "./shinken/satellite.py", line 467, in create_and_launch_worker self.workers[w.id].start() File "./shinken/worker.py", line 63, in start self._process.start() File "/usr/lib/python2.6/multiprocessing/process.py", line 99, in start _cleanup() File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup if p._popen.poll() is not None: File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 10] No child processes Traceback (most recent call last): File "/usr/lib/python2.6/multiprocessing/util.py", line 235, in _run_finalizers finalizer() File "/usr/lib/python2.6/multiprocessing/util.py", line 174, in __call__ res = self._callback(*self._args, **self._kwargs) File "/usr/lib/python2.6/multiprocessing/managers.py", line 576, in _finalize_manager if process.is_alive(): File "/usr/lib/python2.6/multiprocessing/process.py", line 129, in is_alive assert self._parent_pid == os.getpid(), 'can only test a child process' AssertionError: can only test a child process Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in _exit_function for p in active_children(): File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in active_children _cleanup() File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup if p._popen.poll() is not None: File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 10] No child processes Error in sys.exitfunc: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in _exit_function for p in active_children(): File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in active_children _cleanup() File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup if p._popen.poll() is not None: File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 10] No child processes }}} I googled around and see quite lot of ref to theses errors/traceback but I actually can't find any real solution .. I could workaround 1) by changing os._exit to sys.exit call in create_daemon() of daemon.py but the crash in 2) remains. -- Ticket URL: <http://sourceforge.net/apps/trac/shinken/ticket/209> shinken <http://sourceforge.net/projects/shinken/> Shinken is a Linux/Windows compatible Nagios reimplementation in Python. The main goal of the program is to allows users to scale the load : it "cuts" the user's configuration into independent part and send it to workers. ------------------------------------------------------------------------------ Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel