Hi, Folks,

Many of you probably already use the 'health_check.pl' script to monitor your 
VCL system. If you are not familiar with it, it resides in

    $VCL_HOME/bin/health_check.pl

It can be run directly from the command line or set to run on a cron job. It 
verifies that each computer (for a given management node) is operating 
properly. It also can be used to power down compute nodes (though I have never 
used it for this purpose).

The script is really solid, but it can take a long time to complete, especially 
if your management nodes each control a large number of machines. In my 
experience, it typically takes about 10 minutes for every 50 computers. This 
isn't necessarily a problem, but if I want to just get a quick snapshot of the 
overall system health, it is sometimes too long to wait.

So, I wrote a node.js module [1] that runs *significantly* faster -- that is, 
it checks an entire system in only a few seconds. It is also a comparatively 
much lighter load on the management node (no externally spawned processes, only 
a single database query, etc).

The module allows you to write a complete monitoring script like so:

==============================
var health = require('vcl-utils').Health;

health.on('error', function(err) {
  console.log('ERROR :: ' + err);
});

health.on('info', function(msg) {
  console.log('INFO :: ' + msg);
});

health.check();
==============================

Let me know if you have any questions or if you have ideas for improving this.

Best regards,

Aaron


[1] https://github.com/acoburn/vcl-utils



--
Aaron Coburn
Systems Administrator and Programmer
Academic Technology Services, Amherst College
[email protected]<mailto:[email protected]>



Reply via email to