[Shinken-devel] [shinken][poller] problem with defunct process

CAVAGNINI Damien Tue, 02 Aug 2011 06:35:56 -0700

Hello,

At my Company, we are testing both gearmand and shinken for our next monitoring 
infrastructure.
We are facing some problems with shinken pollers, lots of checks are ending in 
defunct process.  (via nagios perl plugins, both officials and of our own)
Sometimes we have up to 800 zombies at a time.


It seems like, the zombies are noticed as snmp_timeout in the log.
I tried different values for service_chek_timeout and host_check_timeout, 
without success.
Actually, both values are set to 60.

The same plugins are  used by nagios and gearmand, and show no problem.

We are checking 16000 services with one poller. The same poller is used for 
shinken or nagios / gearmand (not at the same time of course ;)
16000 checks show no problem with nagios / germand.

The poller is a 8 cores / 16 MT cores with 12 Go RAM.
We have another physical server as arbiter, broker (ndo and NPCD), receiver and 
reactionner ; and some VMs (from 1 up to 3 ) for schedulers.

Any idea about the zombies ?

Regards

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1

_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

[Shinken-devel] [shinken][poller] problem with defunct process

Reply via email to