Hi, Folks,
Many of you probably already use the 'health_check.pl' script to monitor your
VCL system. If you are not familiar with it, it resides in
$VCL_HOME/bin/health_check.pl
It can be run directly from the command line or set to run on a cron job. It
verifies that each computer (for a given management node) is operating
properly. It also can be used to power down compute nodes (though I have never
used it for this purpose).
The script is really solid, but it can take a long time to complete, especially
if your management nodes each control a large number of machines. In my
experience, it typically takes about 10 minutes for every 50 computers. This
isn't necessarily a problem, but if I want to just get a quick snapshot of the
overall system health, it is sometimes too long to wait.
So, I wrote a node.js module [1] that runs *significantly* faster -- that is,
it checks an entire system in only a few seconds. It is also a comparatively
much lighter load on the management node (no externally spawned processes, only
a single database query, etc).
The module allows you to write a complete monitoring script like so:
==============================
var health = require('vcl-utils').Health;
health.on('error', function(err) {
console.log('ERROR :: ' + err);
});
health.on('info', function(msg) {
console.log('INFO :: ' + msg);
});
health.check();
==============================
Let me know if you have any questions or if you have ideas for improving this.
Best regards,
Aaron
[1] https://github.com/acoburn/vcl-utils
--
Aaron Coburn
Systems Administrator and Programmer
Academic Technology Services, Amherst College
[email protected]<mailto:[email protected]>