On Tue, Jul 19, 2011 at 1:18 PM, Lonnie Olson <[email protected]> wrote:
> On Tue, Jul 19, 2011 at 11:52 AM, Steve Dibb <[email protected]> wrote: > > I've got two questions -- how do you guys usually go about monitoring > > this stuff? Monit can check the system general usage, but how do I know > > which applications are doing that? > > You already got great suggestions for this question. > > Monit, Munin, Cacti for general performance graphing/monitoring > Nagios, etc for host/service availability monitoring and notifications > Splunk or basic centralized syslog for log monitoring and analysis > > > > > My second question is, where in the world do you start to diagnose > > something like this? Looking at the system and apache logs, it looks > > like everything just STOPPED. There's no red flags that I can see, so > > I'm having a hard time diagnosing it. > > Nobody touched on this question, likely because it's a pain to > identify sudden massive memory spikes like this. > * Consider the services that the machine provides. > Are any of them likely or possible to eat tons of memory in a very short > time? > * Check out your existing logs for all services. > Is there any other indications from any logs of increased activity? > * Consider timing and frequency of these failures. > Does it happen more than once? At the same frequency? Predictable? > * Look through your scheduled tasks (Cron) for any processes that > may coincide with this timing. > * Consider a more frequent system checker > Run a loop to gather process data. > $ while true; do ps auxww > ps.$(date +%s); sleep 10; done > or something similar > Increase the frequency of your existing monitoring, if possible. > splunk does this; it shows all processes, memory usage, what's using it, etc. on nice colorful graphs. Splunk is free if <500mb/day of ASCII is consumed. Just narrow down the logs to monitor and you're golden, or, install a standalone version on each machine you want to keep an eye on. Nagios/Nessus is great for monitoring; agreed. > > Good Luck > --lonnie > > _______________________________________________ > > UPHPU mailing list > [email protected] > http://uphpu.org/mailman/listinfo/uphpu > IRC: #uphpu on irc.freenode.net > -- Take care, William Attwood Idea Extraordinaire [email protected] _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
