Thx for putting this together, Jayapal. A few comments:

I'd really like to have a config flag to specify if things should be restarted 
automatically or not. Worst case, track the restarts - if a service is 
restarted more than X times in Y seconds, something's obviously wrong so stop 
tail-chasing[1]. Personally I'm much more interested in knowing there's a 
problem and then taking whatever happens to be the appropriate actions for our 
situation.

Regarding communicating with a monitoring system - what makes more sense to me 
is setting up a solid framework that provides folks flexibility to use various 
monitoring tools, from sending an email to contacting pager duty or whatever.

So, to me there's 3 parts to that:
1) At VR creation, ACS calls defined hook-script which knows how to contact 
monitoring system to tell it about system to monitor
2) At boot, VR sends API query to which the mgmt server responds with a URL for 
an install script - VR runs that to download/setup appropriate monitoring agent
3) VR has standardized scripts for agent to call to find out what should be 
running, and then agent can go check for itself.

With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, 
Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module 
(I'm thinking module is hosted outside ACS, but I guess it could be a plugin - 
see earlier licensing points).

Thoughts?

Just my 2c. Happy to tweak wiki if folks lean towards this.

John
1: Aside - this applies to SSVM creation currently - that hamster[2] keeps 
trying to spin that create SSVM wheel..
2: Apache CloudHamster, CloudMonkey's furry monitoring friend?

On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <jayapalreddy.ur...@citrix.com> 
wrote:

> Please find below update FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <santhosh.eduku...@citrix.com> 
> wrote:
> 
>> A shell script can be used. Few thoughts below:
>> 
>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" 
>> of command and then use "kill" command to check if the pid you got is valid. 
>> Using kill we can send a signal 0, then check the status using echo $? . For 
>> sending a notification use linux syslog call ( man 3 syslogd) or "logger" 
>> command to send to syslog. If wanted to send email then you may also have to 
>> look for firewall not allowing outbound smtp port communiation. Even for 
>> snmp this holds same( i mean if any blocking through firewall rules ).  
>> Using syslog may be good as it by default exposes various debug log levels 
>> through its api call.
>> 
>> Now, to keep the monitor script up always up and runninig. Keep the monitor 
>> script run continuosly through cron or at at regular\scheduled intervals. 
>> This way even if monitor script goes down, the next xth interval, it is up 
>> again. 
>> 
>> With this there is a catch though, we may got multiple pids for a given 
>> daemon provided if there are multiple daemons spawned by same\multiple 
>> applications, if this scenario is not common then its ok, otherwise we may 
>> have to track it differently maintaining state of each spawned daemon and 
>> see if it exists. If multiple applications launch the same daemon, you may 
>> also wanted to say its application which got killed. EX: A launched httpd, 
>> and during its exit logic, it is killing all daemons it launched, then you 
>> may wanted to add  A is not available, rather than just http is not 
>> available. 
>> 
>> 
>> 2.  Using  netstat command : Check for available, listening and active ports 
>> on local host, provided all the daemons you wanted to monitor are running on 
>> "standard" ports or if we know the listening ports of those deamons to be 
>> monitored. Again, this script can be added through cron\at to be scheduled 
>> to run x units, if it gets killed the next x units after the monitor script 
>> is up again. 
>> 
>> Also, there could be many other approaches as well.
>> 
>> 
>> Thanks!
>> Santhosh 
>> ________________________________________
>> From: Jayapal Reddy Uradi [jayapalreddy.ur...@citrix.com]
>> Sent: Saturday, October 05, 2013 5:17 AM
>> To: <d...@cloudstack.apache.org>
>> Cc: <users@cloudstack.apache.org>
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Hi,
>> 
>> +users list
>> If any one is already using any tools for monitoring then please share your 
>> ideas.
>> Also share the cases where you experienced service crashes.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <chiradeep.vit...@citrix.com> 
>> wrote:
>> 
>>> Well just make sure that your script is resilient to its own crashes as
>>> well.
>>> 
>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <jayapalreddy.ur...@citrix.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am planning to write script utility to monitor processes and restart on
>>>> the event of failure. It will also logs the events.
>>>> 
>>>> Thanks,
>>>> Jayapal
>>>> 
>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <swel...@ena.com> wrote:
>>>> 
>>>>> supervisord maybe?
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Chiradeep Vittal" <chiradeep.vit...@citrix.com>
>>>>> To: d...@cloudstack.apache.org
>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>> 
>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>> 
>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>> 
>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>>>> <chiradeep.vit...@citrix.com> wrote:
>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>> It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The proposal
>>>>>>> does
>>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>>> the
>>>>>>> AGPL.
>>>>>> 
>>>>>> Let me restate my objection to anything AGPL.
>>>>>> People are largely comfortable with GPLv2 software - Linux is
>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>>>> license is anathema in many corporate environments, and by forcing it
>>>>>> on folks in the default System VM I fear it will hurt adoption of
>>>>>> CloudStack.
>>>>>> 
>>>>>> --David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to