Hi,
I think it's an interesting idea. Why not put just an "on_error :
'kill|restart|bypass' state for the module configuration ?
If it's in internal, the standard is kill. External can take this parameter
and manage it if they want.
restart mean that if it crash (exception raised on internal, or process die
on external) it just call quit() of the module (so it close files and co)
and then re-init it.
By pass should be available only for internal ones (bypass is non sense for
externals), it means that it will just do as the exception was not here (and
pray :p ).
We can add another property too :
* optionnal = 0|1 (0 by default?)
If it's optionnal, the killing of the module or the no-instanciation will
make the whole module in error. If it's the livestatus for example, you
don't care if the broker is still alive or not, it's not useless for you
(maybe only for logs, so you wait a little before put you in no
configuration acceptance).
I think we cover all cases with theses 2 parameters, and they are both on
the modules, so on the global configuration part, it's more easy to manage
:)
Jean
2011/2/8 Grégory Starck <g.sta...@gmail.com>
>
> I see actually it's not considered an error for a shinken module to be
> declared in config of a shinken daemon (arbiter/broker/scheduler/..) but
> it's not possible to instantiate it (for any reason).
>
> I'm asking if it shouldn't be always/by default considered as an "error" (any
> shinken daemon should so either die or put itself in kind of temporary
> "blocked" mode (until the bad module(s) could be correctly loaded (could
> retry every X minutes))) unless said/configured otherwise ; for this I'd
> propose to have new option(s) in a module definition, to permit to manage
> the different cases as the admin desires:
>
> - "warn_on_not_found" : true|false , default=false (== if module can't be
> found in modules path -> (critical) error) ; if true -> only a warning would
> be emitted if given module can't be found.
>
> - "warn_on_bad_instantiate" (or "warn_on_bad_init" is better name ?):
> true|false, default=false (== if module raise any error during instantiation
> or init -> (critical) error) ; if true -> only warning would be emitted if
> module raise any error during instantiation or init.
>
> (and eventually: - "try_reinit" : true|false , only applicable if
> warn_on_bad_instantiate is true :
> if true : would retry to re-instantiate & init the given module each
> time it's possible. (but say max 1 time per X minute(s) (or better: 1 time
> per daemon main loop turn ?).
> if false (default): would simply definitively skip the module after
> the first attempt of instantiation & init.
> )
>
> wdyt ?
>
> further I'm seeing another possible option for external modules only :
> "try_restart_if_died" : true|false ; this would manage "died" modules and,
> when set to true, try to re-instantiate & restart them.. but it's a bit
> different than previous options.. personally I would say that by default we
> should always (try to) restart crashed shinken module instances.. but it
> depends on the case I think : what if a (external) module was
> running correctly for some time but suddenly it crash and would after that
> always directly crash on each init or after restart (like for a
> global/general reason) ? to me it's definitively far more bad than if a
> module would only crash one time every X hours or so (like for very specific
> reason/bug)..
> there is also the case where instead of "simply" crashing a module could go
> in some infinite loop or so and not react anymore at all.. then should'nt it
> be killed ?
>
> any comment welcome.. :)
>
> greg.
>
>
>
> ------------------------------------------------------------------------------
> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> Pinpoint memory and threading errors before they happen.
> Find and fix more than 250 security defects in the development cycle.
> Locate bottlenecks in serial and parallel code that limit performance.
> http://p.sf.net/sfu/intel-dev2devfeb
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel