Hi,

It can be a good thing, but please avoid to put the shinken-devel in cc for
track tickets.

The problem is the fd close. I'm wondering if it's possible to avoid the
pyro fd, or if closing them in the father is so a problem if the son got it.

But it's too late for such a change in 0.5, will be in 0.6.


Jean

On Tue, Jan 18, 2011 at 2:23 PM, shinken <nore...@sourceforge.net> wrote:

> #209: daemon:  should open listening port BEFORE going to daemon mode
> (double
> fork + setsid)
>
> ----------------------+-----------------------------------------------------
>  Reporter:  leblutch  |       Owner:
>     Type:  defect    |      Status:  new
>  Priority:  major     |   Milestone:  0.5 (Eruptive Earthworm)
> Component:  General   |     Version:  0.5 (Eruptive Earthworm)
>  Keywords:            |
>
> ----------------------+-----------------------------------------------------
>  Actually all the daemons (arbiter, schedulers, brokers...), when started,
>  are going in daemon mode *before* opening their listening port (if any).
>
>  This makes the correct detection of failure to open the port impossible in
>  init scripts.
>
>  The fix is quite simple:
>
>  do create_daemon() (should be renamed daemonize() imho) AFTER
>  pyro.init_daemon(..) call.
>  (or pyro.init_daemon must be made BEFORE create_daemon)
>
>  (side effect: in create_daemon you have to skip the close of the fd
>  corresponding to the pyro daemon listening socket)
>
>  Now arbiter, scheduler and broker correctly handles this case.
>
>
>  But I'm now facing very weird exception with reactionner & poller  (and I
>  see that reactionner & poller are using the __init__ & "main" function
>  from Satellite (while arbiter+scheduler+broker override these methods in
>  their respective "bin" script).
>
>  2 problems in fact (they are certainly related I guess):
>
>  1) The init script doesn't return anymore from the call to
>  shinken-{reactionner,poller} (with -d (daemonize)).
>
>  2) The reactionner & poller get this exception nearly directly (arbiter &
>  schedulers must be already running) :
>
>
>  {{{
>  Using working directory : /home/greg/Documents/Projets/shinken/var
>  Opening port: 7771
>  Waiting for initial configuration
>  We ask us for a ping
>  [poller-1] Sending us a configuration {'arbiters': {}, 'global':
>  {'poller_name': 'poller-1', 'max_workers': 4, 'poller_tags': [],
>  'modules': [], 'use_timezone': 'NOTSET', 'polling_interval': 1,
>  'max_plugins_output_length': 8192, 'min_workers': 4,
>  'processes_by_worker': 256}, 'schedulers': {0: {'instance_id': 0,
>  'active': True, 'address': 'localhost', 'port': 7768, 'name':
>  'scheduler-1'}}}
>  DBG: scheduler UIR: PYROLOC://localhost:7768/Checks
>  [poller-1] Init de connexion with scheduler-1 at
>  PYROLOC://localhost:7768/Checks
>  [poller-1] Connexion OK with scheduler scheduler-1
>  Max output lenght 8192
>  We have our schedulers : {0: {'wait_homerun': {}, 'name': 'scheduler-1',
>  'uri': 'PYROLOC://localhost:7768/Checks', 'instance_id': 0, 'running_id':
>  0.6091505580666029, 'address': 'localhost', 'active': True, 'port': 7768,
>  'con': <DynamicProxy for PYROLOC://127.0.0.1:7768/Checks>}}
>  Init main
>  [poller-1] Init de connexion with scheduler-1 at
>  PYROLOC://localhost:7768/Checks
>  [poller-1] Connexion OK with scheduler scheduler-1
>  [poller-1] Allocating new Worker : 0
>  Traceback (most recent call last):
>   File "/home/greg/Documents/Projets/shinken/bin/shinken-poller", line
>  160, in <module>
>     p.main()
>   File "./shinken/satellite.py", line 657, in main
>     self.create_and_launch_worker() #create mortal worker
>   File "./shinken/satellite.py", line 467, in create_and_launch_worker
>     self.workers[w.id].start()
>   File "./shinken/worker.py", line 63, in start
>     self._process.start()
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 99, in start
>     _cleanup()
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
>  _cleanup
>     if p._popen.poll() is not None:
>   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
>     pid, sts = os.waitpid(self.pid, flag)
>  OSError: [Errno 10] No child processes
>
>  Traceback (most recent call last):
>   File "/usr/lib/python2.6/multiprocessing/util.py", line 235, in
>  _run_finalizers
>     finalizer()
>   File "/usr/lib/python2.6/multiprocessing/util.py", line 174, in __call__
>     res = self._callback(*self._args, **self._kwargs)
>   File "/usr/lib/python2.6/multiprocessing/managers.py", line 576, in
>  _finalize_manager
>     if process.is_alive():
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 129, in
>  is_alive
>     assert self._parent_pid == os.getpid(), 'can only test a child
>  process'
>  AssertionError: can only test a child process
>
>  Error in atexit._run_exitfuncs:
>  Traceback (most recent call last):
>   File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
>     func(*targs, **kargs)
>   File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in
>  _exit_function
>     for p in active_children():
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in
>  active_children
>     _cleanup()
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
>  _cleanup
>     if p._popen.poll() is not None:
>   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
>     pid, sts = os.waitpid(self.pid, flag)
>  OSError: [Errno 10] No child processes
>
>  Error in sys.exitfunc:
>  Traceback (most recent call last):
>   File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
>     func(*targs, **kargs)
>   File "/usr/lib/python2.6/multiprocessing/util.py", line 262, in
>  _exit_function
>     for p in active_children():
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 43, in
>  active_children
>     _cleanup()
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in
>  _cleanup
>     if p._popen.poll() is not None:
>   File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
>     pid, sts = os.waitpid(self.pid, flag)
>  OSError: [Errno 10] No child processes
>  }}}
>
>
>
>  I googled around and see quite lot of ref to theses errors/traceback but I
>  actually can't find any real solution ..
>
>
>  I could workaround 1) by changing os._exit to sys.exit call in
>  create_daemon() of daemon.py  but the crash in 2) remains.
>
> --
> Ticket URL: <http://sourceforge.net/apps/trac/shinken/ticket/209>
> shinken <http://sourceforge.net/projects/shinken/>
> Shinken is a Linux/Windows compatible Nagios reimplementation in Python.
> The main goal of the program is to allows users to scale the load : it
> "cuts" the user's configuration into independent part and send it to
> workers.
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to