On Tue, Jul 19, 2011 at 1:18 PM, Lonnie Olson <[email protected]> wrote:

> On Tue, Jul 19, 2011 at 11:52 AM, Steve Dibb <[email protected]> wrote:
> > I've got two questions -- how do you guys usually go about monitoring
> > this stuff?  Monit can check the system general usage, but how do I know
> > which applications are doing that?
>
> You already got great suggestions for this question.
>
> Monit, Munin, Cacti for general performance graphing/monitoring
> Nagios, etc for host/service availability monitoring and notifications
> Splunk or basic centralized syslog for log monitoring and analysis
>
> >
> > My second question is, where in the world do you start to diagnose
> > something like this?  Looking at the system and apache logs, it looks
> > like everything just STOPPED.  There's no red flags that I can see, so
> > I'm having a hard time diagnosing it.
>
> Nobody touched on this question, likely because it's a pain to
> identify sudden massive memory spikes like this.
> * Consider the services that the machine provides.
>  Are any of them likely or possible to eat tons of memory in a very short
> time?
> * Check out your existing logs for all services.
>  Is there any other indications from any logs of increased activity?
> * Consider timing and frequency of these failures.
>  Does it happen more than once?  At the same frequency?  Predictable?
>  * Look through your scheduled tasks (Cron) for any processes that
> may coincide with this timing.
> * Consider a more frequent system checker
>  Run a loop to gather process data.
>    $ while true; do ps auxww > ps.$(date +%s); sleep 10; done
>    or something similar
>  Increase the frequency of your existing monitoring, if possible.
>

splunk does this; it shows all processes, memory usage, what's using it,
etc. on nice colorful graphs.

Splunk is free if <500mb/day of ASCII is consumed.  Just narrow down the
logs to monitor and you're golden, or, install a standalone version on each
machine you want to keep an eye on.

Nagios/Nessus is great for monitoring; agreed.






>
> Good Luck
> --lonnie
>
> _______________________________________________
>
> UPHPU mailing list
> [email protected]
> http://uphpu.org/mailman/listinfo/uphpu
> IRC: #uphpu on irc.freenode.net
>



-- 
Take care,
William Attwood
Idea Extraordinaire
[email protected]

_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to