On Wed, Dec 9, 2009 at 2:24 AM, Rainer Heilke <rhei...@dragonhearth.com> wrote:
> Peter Tribble wrote:
>>
>> I was only thinking of kstat -p output as prototype/illustration. Although
>> being able to munge the output with the normal awk/sed/grep/perl and
>> chuck it straight into one's plotting package of choice does have some
>> appeal!
>
> And if the data corrupts, there's a possibility you may be able to get
> through/around the corruption. When sar data corrupts, everything for the
> rest of the day is gone. Realistically speaking, is 5MB of text data that
> big of a deal today?
>
>> I was thinking more of using cron for some of that.
>
> Since cron is more than likely already running (does _anyone_ turn it off?),
> I'd go with it, myself.
>
>> Keeping the storage under control is clearly going to be something that's
>> going to need quite a bit of thought. One of my aims is to actually have
>> very much more data to chew on, so that storage would naturally be
>> expected to increase.
>
> Yes, but see above. I'd rather have 10MB of text data than 0.5MB of binary I
> can't do anything with when everything starts circling the drain.
>
>>  - Only save the processed data that sar uses. Compatible, but useless.
>
> Actually, that's better than sar. But now I'm just repeating myself. :-)

And to support my previous claims that sar is useless because it has a
propensity to corrupt data:

$ du -h /var/adm/sa/*
 2.3M   /var/adm/sa/sa16
 1.3M   /var/adm/sa/sa17
 336K   /var/adm/sa/sa20

$ sar -u -f /var/adm/sa/sa16

SunOS xxxx 5.10 Generic_137137-09 sun4u    10/16/2009

08:24:09    %usr    %sys    %wio   %idle
08:24:09        unix restarts
11:44:38        unix restarts
15:32:16        unix restarts
19:50:38        unix restarts
21:15:02        unix restarts
21:44:33        unix restarts
22:08:38        unix restarts

$ sar -A -f /var/adm/sa/sa16 | grep -v 'unix restarts' | grep .
SunOS xxxx 5.10 Generic_137137-09 sun4u    10/16/2009
08:24:09    %usr    %sys    %wio   %idle
08:24:09   device        %busy   avque   r+w/s  blks/s  avwait  avserv
08:24:09 runq-sz %runocc swpq-sz %swpocc
08:24:09 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
08:24:09 swpin/s bswin/s swpot/s bswot/s pswch/s
08:24:09 scall/s sread/s swrit/s  fork/s  exec/s rchar/s wchar/s
08:24:09  iget/s namei/s dirbk/s
08:24:09 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s
08:24:09  proc-sz    ov  inod-sz    ov  file-sz    ov   lock-sz
08:24:09   msg/s  sema/s
08:24:09  atch/s  pgin/s ppgin/s  pflt/s  vflt/s slock/s
08:24:09  pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
08:24:09 freemem freeswap
08:24:09 sml_mem   alloc  fail  lg_mem   alloc  fail  ovsz_alloc  fail

2.3 MB of corrupt sar data on a S10u6 system.  The other days are the same.

>
>>  - Compress in the time domain, so you don't keep saving the kstats
>> that don't change. (A quick test on my desktop - only 10% or so of
>> the statistics actually change over an hour.)
>
> I like this--only keep the data that changes. You need to be careful about
> how you parse (sed/grep/awk/Perl) it, though.

If any sort of compression is used (such as not repeating values that
are observed to be the same as the last time) it is essential that
there is a robust decompression mechanism (API and command).  The
output of such a command can be piped to traditional tools.

>> However, I think that if this turns out to be useful, then it will be seen
>> to be much less of an issue. And I can see users wanting to increase
>> the sampling rate to get at more detail.
>
> As I've had to do on several occasions with the data I was collecting. One
> system was at every 5 minutes and that wasn't enough.
>
> We really need to make sure Sun (I'm thinking of the support wings) to buy
> into anything we do, though. As long as they insist on sar data, we're
> hooped.

Sampling at a higher rate with automatic roll-up and pruning is very
helpful. This is the key to the success of rrdtool.  This is also the
approach used by the analytics found Sun's commercial webstack
offerings.  After seeing rrdtool behind the scenes in webstack, it
makes me wonder if it is also used in the analytics with Amber Road.

The interface to sar is a handful of commands.  It should not be hard
to provide the same commands that get their data from whatever format
is chosen as a successor.  When support asks for sar data, you can
give it to them.  When that does not show the full story, the modern
statistics that are important can be pulled.  Somewhat importantly,
they will tell a more detailed story (more measurements at the same
intervals and sample times) rather than a slightly different story
(different sample times, different intervals, maybe
confusing/conflicting terminology).

And on the wishlist

- A web service or similar that makes it possible to collect this data
or do analysis that spans systems.
- API's and tools that can talk to these web services and aid in data
visualization.

FWIW, I have repeatedly asked Sun to sell me Amber Road style
analytics for use on Solaris boxes.  I can't say that I have heard any
indication that they will ever deliver such a thing.  Perhaps this
effort can be a step in that direction.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
sysadmin-discuss mailing list
sysadmin-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/sysadmin-discuss

Reply via email to