On Tue, Dec 1, 2009 at 4:03 PM, Peter Tribble <peter.trib...@gmail.com> wrote: > On Tue, Dec 1, 2009 at 12:16 AM, Mike Gerdts <mger...@gmail.com> wrote: > ... >>> What if we just scrapped the current sar collector (sadc) and just saved >>> kstat -p output (or something like it) instead? >>> > >> >> I like the general idea, but I wonder if there are enough >> kstat-as-text consumers out there to make that an important target. > > Note that kstat-as-text is just an example; a real implementation may well > be some (yet to be defined) dedicated format. The important point is that > you just dump kstats, and leave the processing/analysis until later. > > (Although the ability to slurp the text through the standard grep/awk/whatever > set of tools does have its attractions. But you could easily generate > the textual > form from any stable representation.) > >> How about instead have a target of an SQLite database? The really >> nifty things about this approach could be: >> >> - sqlite is already used in many places in OpenSolaris (SMF, firefox, ...) >> - accessible to real languages through JDBC or ODBC >> - accessible to shell scripts via sqlite command line >> (/lib/svc/bin/sqlite today) >> - easy to prune old data by deleting old rows >> - easy to aggregate old data (prior to deletion) through SQL query >> - could open up the doors to analysis of data leveraging the power of >> SQL rather than complex application code. >> - data already organized in a nice way for importing into an >> enterprise performance analysis system. >> - extensible beyond kstat data (dtrace aggregation written to DB?) > > I'm investigating that too (of course), but I'm not sure it's necessarily > an obvious win. For one thing, you really want to squeeze the data > size, whereas a db tends to enlarge it. I also want to be able to stage > this off easily - although I guess one database per day would work, > but then you lose some of the advantages of aggregating the data into > a database in the first place.
I didn't think that this was necessarily true, but I wanted to have some data to back up my thoughts. I've been collecting sar data on a machine for a few days, then wrote a script to load the data into a sqlite database. My findings are very promising. The sqlite database is in the file /var/tmp/sar. $ du -h /var/tmp/sar /var/adm/sa 1.7M /var/tmp/sar 5.1M /var/adm/sa There are several things at play here that possibly help the sqlite case. - sqlite will use dynamic length strings for disk devices, compared to fixed length that are stored by sadc. - sqlite will use dynamic sized integers, compared to (I think) fixed size 64-bit integers used by sadc. - When I ran across numbers that looked like floating point, I made them into integers by multiplying by 10 or 100 (as appropriate for column) to be able to use a small integer (e.g. usually a char or short) instead of a 64-bit float. And then when I put it onto a compressed zfs file system... # mkfile /tmp/zp # zpool create zp /tmp/zp # zfs set compression=on zp # cp /var/tmp/sar /zp # du -h /var/tmp/sar /zp/sar 1.7M /var/tmp/sar 625K /zp/sar # zfs get compressratio zp NAME PROPERTY VALUE SOURCE zp compressratio 2.79x - The quick and dirty perl script that I used for the conversion is attached so that others can compare their data and be sure I don't have stupid errors that cause it to skip data that it shouldn't. I expect that I am seeing best case results because the system was not very active during the days of data collection and as such there are a lot of very small numbers (fit in char) and a lot of 0's (helping compression). I'd love to see how others' data from more active machines stacks up. $ ~/scripts/sar2sql /var/adm/sa/sa* | /lib/svc/bin/sqlite /tmp/mynewsardb drop table sar_a; SQL error: no such table: sar_a drop table sar_b; SQL error: no such table: sar_b ... $ du -h /tmp/mynewsardb 1.7M /tmp/mynewsardb $ /lib/svc/bin/sqlite /tmp/mynewsardb SQLite version 2.8.15-repcached-Generic Patch Enter ".help" for instructions sqlite> select count(*) from sar_d; 20127 sqlite> Note that the errors are happening because a new database is being created. And yes, I do agree with others with regard to the use of rrdtool. For longer term storage and viewing, it is hard to beat. The tool that I wrote that collects performance data for lots of hosts and provides trend data for 3 years is all in RRD and has been extremely helpful. For ad-hoc reporting SQL is quite handy. There is a balance to be struck somewhere in between. -- Mike Gerdts http://mgerdts.blogspot.com/
sar2sql
Description: Binary data
_______________________________________________ sysadmin-discuss mailing list sysadmin-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/sysadmin-discuss