Hi,
In the last weeks, we hacked some part of the code :
- WebUI got a HUGE refactoring, thanks to Andreas. Here are screenshots
of the last week version (
http://www.shinken-monitoring.org/news/preview-of-the-webui-next-version/)
and the current one is even better, with more widgets for the new dashboard
and more filtering options :)
- Lot of Graphite UI tweaks, thanks to Claneys Skyne and H4wkmoon
- We got a (working) draft of the triggers!
- New external commands : PROCESS_SERVICE_OUTPUT PROCESS_SERVICE_OUTPUT.
It's hugely linked with triggers for the monitoring. It will update the
output/perfdata and keep the same state.
For the triggers, they can be defined in the object definitions, but it's
quite ugly (need to add some \n and \t everywhere...). I'm trying something
else, a triggers_dir options taht can load some path and put them in a sort
of a PATH. In theses directories (and sub-dir) you put some .trig files
that are in fact python code. Here is an example :
We got 3 services : srv-web-1/Http, srv-web-2/Http and srv-web-3/Http witth
the outputs :
Http OK | time=0.0001s
We wan to have a service that got as perfdata the average value of the
response time of the other Http services.
We create it like this :
define service{
use generic-service
host_name srv-web-avg
service_description HttpAverage
check_command check_dummy!0
check_interval 1
trigger_name avg_http ; <---- Here is what you are looking
for
}
It will launch a check_dummy call with an OK state. And after it, it will
launch the avg_http trigger.
But what is this thing? It's a file names avg_http.trig in the directory we
give in the configuration (remember the PATH thing like?)
etc/sample/triggers.d/avg_http.trig :
names = ['srv-web-%d/Http' %i for i in range(1, 4)]
srvs = [get_object(name) for name in names]
perfs = [perf(srv, 'time') for srv in srvs]
value = sum(perfs, 0.0)/len(perfs)
self.output = 'Trigger launch OK'
self.perf_data = 'HttpAverage=%.3f' % value
For whom who don't talk python, it create a list of the services
srv-web-1/Http,
srv-web-2/Http, and srv-web-3/Http. It get the objects from the core, and
take their 'time' perfdata value. Then we compute the average value, and
put this in the element that is calling it, so srv-web-avg/HttpAverage.
We need more functions than get_objects and so, to get perfdata more easily
like perfs('srv-web-*/Http', 'time') instead of the 3 first lines.
The computed service is a standard one, so the data will be brok like
always, and you will get the graph in PNP/Graphite, or maybe used from
another trigger! :)
>From now it's a draft, and it output lot of craps on the debug output, but
it can be a good thing for KPI computing, or complex state based on rules
that standard bp_rule can't manage :)
Let me know what you think about it, and if you think about a use case that
can be insteresting :)
One last point, I just pushed a collectd module for the arbiter. The idea
is to listen for collectd data, and update perfdata with it (now you know
why I add PROCESS_SERVICE_OUTPUT :p ). I don't know how it will scale, so
from now it's a mere work in progress. If you got experience with Collectd,
fell free to test it :)
Next things : finish the triggers, and get skonf working :)
Jean
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel