Re: [sysadmin-discuss] [observability-discuss] Rethinking sar

Mike Gerdts Sat, 05 Dec 2009 22:40:59 -0800

On Tue, Dec 1, 2009 at 4:03 PM, Peter Tribble <peter.trib...@gmail.com> wrote:
> On Tue, Dec 1, 2009 at 12:16 AM, Mike Gerdts <mger...@gmail.com> wrote:
> ...
>>> What if we just scrapped the current sar collector (sadc) and just saved
>>> kstat -p output (or something like it) instead?
>>>
>
>>
>> I like the general idea, but I wonder if there are enough
>> kstat-as-text consumers out there to make that an important target.
>
> Note that kstat-as-text is just an example; a real implementation may well
> be some (yet to be defined) dedicated format. The important point is that
> you just dump kstats, and leave the processing/analysis until later.
>
> (Although the ability to slurp the text through the standard grep/awk/whatever
> set of tools does have its attractions. But you could easily generate
> the textual
> form from any stable representation.)
>
>> How about instead have a target of an SQLite database?  The really
>> nifty things about this approach could be:
>>
>> - sqlite is already used in many places in OpenSolaris (SMF, firefox, ...)
>> - accessible to real languages through JDBC or ODBC
>> - accessible to shell scripts via sqlite command line
>> (/lib/svc/bin/sqlite today)
>> - easy to prune old data by deleting old rows
>> - easy to aggregate old data (prior to deletion) through SQL query
>> - could open up the doors to analysis of data leveraging the power of
>> SQL rather than complex application code.
>> - data already organized in a nice way for importing into an
>> enterprise performance analysis system.
>> - extensible beyond kstat data (dtrace aggregation written to DB?)
>
> I'm investigating that too (of course), but I'm not sure it's necessarily
> an obvious win. For one thing, you really want to squeeze the data
> size, whereas a db tends to enlarge it. I also want to be able to stage
> this off easily - although I guess one database per day would work,
> but then you lose some of the advantages of aggregating the data into
> a database in the first place.


I didn't think that this was necessarily true, but I wanted to have
some data to back up my thoughts.  I've been collecting sar data on a
machine for a few days, then wrote a script to load the data into a
sqlite database.  My findings are very promising.  The sqlite database
is in the file /var/tmp/sar.

$ du -h /var/tmp/sar /var/adm/sa
 1.7M   /var/tmp/sar
 5.1M   /var/adm/sa

There are several things at play here that possibly help the sqlite case.

- sqlite will use dynamic length strings for disk devices, compared to
fixed length that are stored by sadc.
- sqlite will use dynamic sized integers, compared to (I think) fixed
size 64-bit integers used by sadc.
- When I ran across numbers that looked like floating point, I made
them into integers by multiplying by 10 or 100 (as appropriate for
column) to be able to use a small integer (e.g. usually a char or
short) instead of a 64-bit float.

And then when I put it onto a compressed zfs file system...

# mkfile /tmp/zp
# zpool create zp /tmp/zp
# zfs set compression=on zp
# cp /var/tmp/sar /zp
# du -h /var/tmp/sar /zp/sar
 1.7M   /var/tmp/sar
 625K   /zp/sar
# zfs get compressratio zp
NAME  PROPERTY       VALUE  SOURCE
zp    compressratio  2.79x  -

The quick and dirty perl script that I used for the conversion is
attached so that others can compare their data and be sure I don't
have stupid errors that cause it to skip data that it shouldn't.  I
expect that I am seeing best case results because the system was not
very active during the days of data collection and as such there are a
lot of very small numbers (fit in char) and a lot of 0's (helping
compression).  I'd love to see how others' data from more active
machines stacks up.

$ ~/scripts/sar2sql /var/adm/sa/sa* | /lib/svc/bin/sqlite /tmp/mynewsardb
drop table sar_a;
SQL error: no such table: sar_a
drop table sar_b;
SQL error: no such table: sar_b
...

$ du -h /tmp/mynewsardb
 1.7M   /tmp/mynewsardb

$ /lib/svc/bin/sqlite /tmp/mynewsardb
SQLite version 2.8.15-repcached-Generic Patch
Enter ".help" for instructions
sqlite> select count(*) from sar_d;
20127
sqlite>

Note that the errors are happening because a new database is being created.

And yes, I do agree with others with regard to the use of rrdtool.
For longer term storage and viewing, it is hard to beat.  The tool
that I wrote that collects performance data for lots of hosts and
provides trend data for 3 years is all in RRD and has been extremely
helpful.  For ad-hoc reporting SQL is quite handy.  There is a balance
to be struck somewhere in between.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

sar2sql
Description: Binary data

_______________________________________________
sysadmin-discuss mailing list
sysadmin-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/sysadmin-discuss

Re: [sysadmin-discuss] [observability-discuss] Rethinking sar

Reply via email to