I see actually it's not considered an error for a shinken module to be
declared in config of a shinken daemon (arbiter/broker/scheduler/..) but
it's not possible to instantiate it (for any reason).
I'm asking if it shouldn't be always/by default considered as an "error" (any
shinken daemon should so either die or put itself in kind of temporary
"blocked" mode (until the bad module(s) could be correctly loaded (could
retry every X minutes))) unless said/configured otherwise ; for this I'd
propose to have new option(s) in a module definition, to permit to manage
the different cases as the admin desires:
- "warn_on_not_found" : true|false , default=false (== if module can't be
found in modules path -> (critical) error) ; if true -> only a warning would
be emitted if given module can't be found.
- "warn_on_bad_instantiate" (or "warn_on_bad_init" is better name ?):
true|false, default=false (== if module raise any error during instantiation
or init -> (critical) error) ; if true -> only warning would be emitted if
module raise any error during instantiation or init.
(and eventually: - "try_reinit" : true|false , only applicable if
warn_on_bad_instantiate is true :
if true : would retry to re-instantiate & init the given module each
time it's possible. (but say max 1 time per X minute(s) (or better: 1 time
per daemon main loop turn ?).
if false (default): would simply definitively skip the module after the
first attempt of instantiation & init.
)
wdyt ?
further I'm seeing another possible option for external modules only :
"try_restart_if_died" : true|false ; this would manage "died" modules and,
when set to true, try to re-instantiate & restart them.. but it's a bit
different than previous options.. personally I would say that by default we
should always (try to) restart crashed shinken module instances.. but it
depends on the case I think : what if a (external) module was
running correctly for some time but suddenly it crash and would after that
always directly crash on each init or after restart (like for a
global/general reason) ? to me it's definitively far more bad than if a
module would only crash one time every X hours or so (like for very specific
reason/bug)..
there is also the case where instead of "simply" crashing a module could go
in some infinite loop or so and not react anymore at all.. then should'nt it
be killed ?
any comment welcome.. :)
greg.
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel