No, it kept going after that, I just didn't want to post too much :-)

The problem isn't that it doesn't ever get a config it's that it'll get one and then a bit later the arbiter will timeout trying to give it another.  Then it'll recover and one of the other daemons will timeout (broker, receiver, reactionner, scheduler, another poller) and it never seems to stop dispatching and re-dispatching configs.  It finally seems to have settled down after my latest round of restarts.  I have a new config to load (some server changes were made recently) so we'll see if it goes any more smoothly this time.

On 5/13/15 1:50 PM, Olivier Hanesse wrote:
and nothing after [1431547694] INFO: [Shinken] Waiting for initial configuration ?
On arbiter side, I guess there is an error log saying that he did'nt succeed to push configuration for that poller ?


2015-05-13 22:10 GMT+02:00 David Good <dg...@willingminds.com>:
Here's everything up to when it starts waiting for its configuration from the arbiter:

[root@shinken1 shinken]# /usr/bin/shinken-poller -r -c /etc/shinken/daemons/pollerd.ini
[1431547694] INFO: [Shinken] Stale pidfile exists at invalid literal for int() with base 10: '' (/var/run/shinken/pollerd.pid). Reusing it.
[1431547694] INFO: [Shinken] Opening HTTP socket at http://0.0.0.0:7771
Bottle server starting up (using CherryPyServer(ssl_key='', ssl_cert='', daemon_thread_pool_size=50, ca_cert='', use_ssl=False))...
Listening on http://0.0.0.0:7771/
Use Ctrl-C to quit.

[1431547694] INFO: [Shinken] Initializing a CherryPy backend with 50 threads
Shutting down...
[1431547694] INFO: [Shinken] Shinken 2.2
[1431547694] INFO: [Shinken] Copyright (c) 2009-2014:
[1431547694] INFO: [Shinken] Gabes Jean (napar...@gmail.com)
[1431547694] INFO: [Shinken] Gerhard Lausser, gerhard.laus...@consol.de
[1431547694] INFO: [Shinken] Gregory Starck, g.sta...@gmail.com
[1431547694] INFO: [Shinken] Hartmut Goebel, h.goe...@goebel-consult.de
[1431547694] INFO: [Shinken] License: AGPL
[1431547694] INFO: [Shinken] Stale pidfile exists at invalid literal for int() with base 10: '' (/var/run/shinken/pollerd.pid). Reusing it.
[1431547694] INFO: [Shinken] Opening HTTP socket at http://0.0.0.0:7771
[1431547694] INFO: [Shinken] Initializing a CherryPy backend with 50 threads
[1431547694] INFO: [Shinken] Using the local log file '/var/log/shinken/pollerd.log'
[1431547694] INFO: [Shinken] Starting HTTP daemonList to register :[('__init__', <bound method IForArbiter.__init__ of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('api', <bound method IForArbiter.api of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('api_full', <bound method IForArbiter.api_full of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('get_external_commands', <bound method IForArbiter.get_external_commands of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('get_log_level', <bound method IForArbiter.get_log_level of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('get_running_id', <bound method IForArbiter.get_running_id of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('get_start_time', <bound method IForArbiter.get_start_time of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('got_conf', <bound method IForArbiter.got_conf of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('have_conf', <bound method IForArbiter.have_conf of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('ping', <bound method IForArbiter.ping of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('push_broks', <bound method IForArbiter.push_broks of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('push_host_names', <bound method IForArbiter.push_host_names of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('put_conf', <bound method IForArbiter.put_conf of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('remove_from_conf', <bound method IForArbiter.remove_from_conf of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('set_log_level', <bound method IForArbiter.set_log_level of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('wait_new_conf', <bound method IForArbiter.wait_new_conf of <shinken.satellite.IForArbiter object at 0x26ff450>>), ('what_i_managed', <bound method IForArbiter.what_i_managed of <shinken.satellite.IForArbiter object at 0x26ff450>>)]

Registering api [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering api_full [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering get_external_commands [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering get_log_level [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering get_running_id [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering get_start_time [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering got_conf [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering have_conf [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering ping [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering push_broks ['broks'] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering push_host_names ['sched_id', 'hnames'] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering put_conf ['conf'] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering remove_from_conf ['sched_id'] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering set_log_level ['loglevel'] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering wait_new_conf [] <shinken.satellite.IForArbiter object at 0x26ff450>
Registering what_i_managed [] <shinken.satellite.IForArbiter object at 0x26ff450>
picking only bound methods of class and not parents
List to register :[('get_broks', <bound method IBroks.get_broks of <shinken.satellite.IBroks object at 0x26ff590>>)]
Registering get_broks ['bname'] <shinken.satellite.IBroks object at 0x26ff590>
picking only bound methods of class and not parents
List to register :[('get_returns', <bound method ISchedulers.get_returns of <shinken.satellite.ISchedulers object at 0x26ff610>>), ('push_actions', <bound method ISchedulers.push_actions of <shinken.satellite.ISchedulers object at 0x26ff610>>)]
Registering get_returns ['sched_id'] <shinken.satellite.ISchedulers object at 0x26ff610>
Registering push_actions ['actions', 'sched_id'] <shinken.satellite.ISchedulers object at 0x26ff610>
picking only bound methods of class and not parents
List to register :[('get_raw_stats', <bound method IStats.get_raw_stats of <shinken.satellite.IStats object at 0x26ff4d0>>)]
Registering get_raw_stats [] <shinken.satellite.IStats object at 0x26ff4d0>
[1431547694] INFO: [Shinken] Modules directory: /var/lib/shinken/modules
[1431547694] INFO: [Shinken] Modules directory: /var/lib/shinken/modules
[1431547694] INFO: [Shinken] Waiting for initial configuration


On 5/13/15 12:59 PM, Olivier Hanesse wrote:
A poller for example.

And yes, do a "ps -ef" and copy/paste the command line without the "-d" (daemon mode)


Le 13 mai 2015 9:17 PM, "David Good" <dg...@willingminds.com> a écrit :
Which daemon?  Do you mean just run something like /usr/bin/shinken-poller from the commandline?

On 5/13/15 11:39 AM, Olivier Hanesse wrote:
Could you launch a daemon in "cli" and paste the output ?



2015-05-13 20:32 GMT+02:00 David Good <dg...@willingminds.com>:
No, because it's not just the pollers having trouble.  I may try that today though just to eliminate the possibility.


On 5/13/15 11:22 AM, Olivier Hanesse wrote:
Did you try without the nrpe-booster module on poller ?

2015-05-13 20:11 GMT+02:00 David Good <dg...@willingminds.com>:
On 5/12/15 4:16 PM, David Good wrote:
On 5/12/15 2:46 PM, Felipe openglx wrote:
The devs will be able to give more specifics (maybe even confirm if 
2.4 performs better for your case?) but I faced similar issues with 
timeout because of the time it took to "slice and dice" the amount of 
objects.
If you can enable debug mode on all nodes and provide some captures it 
would be great.
OK -- I'll see about setting that up.


Here's one set of captures.  I'm choosing timeouts between daemons running on the same server as the arbiter so there's no issue of network interference or clock skew.

Here's the arbiter:

[1431539957] INFO: [Shinken] [All] Trying to send configuration to poller poller-1
[1431539960] ERROR: [Shinken] Failed sending configuration for poller-1: Connexion error to http://shinken1.dc1.example.com:7771/ : Operation timed out after 3001 milliseconds with 0 bytes received
[1431539960] INFO: [Shinken] [All] Trying to send configuration to poller poller-4

Here's the corresponding entries from poller-1.  Note that we have a lot of servers we're checking via NRPE but that don't have the NRPE daemon setup properly yet to allow access from the shinken servers, which looking at the code where these error messages are generated seems to be the cause of these errors:

[1431539922] DEBUG: [Shinken] Error on SSL shutdown : library=missing reason=missing : [] ; Tracebac
k (most recent call last):
  File "/var/lib/shinken/modules/booster-nrpe/module.py", line 220, in close
    break
Error: []

[1431539957] DEBUG: [Shinken] Error on SSL shutdown : library=missing reason=missing : [] ; Tracebac
k (most recent call last):
  File "/var/lib/shinken/modules/booster-nrpe/module.py", line 220, in close
    break
Error: []

[1431539957] DEBUG: [Shinken] Error on SSL shutdown : library=missing reason=missing : [] ; Tracebac
k (most recent call last):
  File "/var/lib/shinken/modules/booster-nrpe/module.py", line 220, in close
    break
Error: []

[1431540097] DEBUG: [Shinken] socket.shutdown failed: [Errno 107] Transport endpoint is not connecte
d
[1431540097] DEBUG: [Shinken] socket.shutdown failed: [Errno 9] Bad file descriptor
[1431540097] DEBUG: [Shinken] Error on SSL shutdown : library=missing reason=missing : [] ; Tracebac
k (most recent call last):
  File "/var/lib/shinken/modules/booster-nrpe/module.py", line 220, in close
    break
Error: []

There doesn't seem to be anything going on with the poller at the time that the arbiter is complaining about it not responding.



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel




------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel




------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel




------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to