On Wed, Dec 9, 2009 at 9:41 PM, Rainer Heilke <rhei...@dragonhearth.com> wrote: > Resubmitted after subscribing to Observability-discuss to keep threads > complete. RHH > > Mike Gerdts wrote: >>> >>>> - Only save the processed data that sar uses. Compatible, but useless. >>> Actually, that's better than sar. But now I'm just repeating myself. :-) >> >> And to support my previous claims that sar is useless because it has a >> propensity to corrupt data: > > "Propensity"? That's a little mild, isn't it? I would tend to lean toward > the hypothesis that it is _designed_, however unintentionally, to corrupt > its data.
Well, I'm not so sure about that... > >>>> - Compress in the time domain, so you don't keep saving the kstats >>>> that don't change. (A quick test on my desktop - only 10% or so of >>>> the statistics actually change over an hour.) >>> I like this--only keep the data that changes. You need to be careful >>> about >>> how you parse (sed/grep/awk/Perl) it, though. >> >> If any sort of compression is used (such as not repeating values that >> are observed to be the same as the last time) it is essential that >> there is a robust decompression mechanism (API and command). The >> output of such a command can be piped to traditional tools. > > I don't think of this as compression, just as not storing data that hasn't > changed. The data, IMO, must remain as text. While an API would be very > useful, I'm strongly opposed to converting the data into a format that > sed/awk/grep/Perl cannot consume directly. The API would merely be a > front-end to simplify the filling in of blank fields in my perspective of > what is stored. The key thing here is that just omitting values isn't the same as marking them as not changed. That is, it is difficult to tell whether the data was not measured (program error, kstat not available, system too busy to complete monitoring) or whether it was the same as in the last interval. Having data accessible as text is often useful, but it does tend to bloat the data and make it extremely slow to analyze. The tool that I wrote to store performance data in rrd files was because the tool that we paid lots of money for stored its data as plain text. Whenever you needed any information, it would parse a huge text file. It would page me at night saying that something was busy or full. It would take 10+ minutes for the monitoring system to give me a graph of the performance over the past hour and/or last week at this same time. By putting the data into a format that could be directly accessed (rrd) rather than parsed (text), operations that previously took 10+ minutes completed in less than a second. As a means to be sure that the rrdtool graphs represented reality (and allow other analysis) I also store a few weeks of text files that represent the raw data collected. After gaining confidence that the data was being fed to rrdtool correctly and that it was doing the right thing with it, I found that I hardly ever referred back to the text files. >>> We really need to make sure Sun (I'm thinking of the support wings) to >>> buy >>> into anything we do, though. As long as they insist on sar data, we're >>> hooped. >> >> Sampling at a higher rate with automatic roll-up and pruning is very >> helpful. This is the key to the success of rrdtool. This is also the >> approach used by the analytics found Sun's commercial webstack >> offerings. After seeing rrdtool behind the scenes in webstack, it >> makes me wonder if it is also used in the analytics with Amber Road. > > Agreed, but I would prefer rrdtool consumes the data one or two days later. > On any given day, I want to be sure I can easily access that day's and the > day before's data. Perhaps I'm being overly paranoid due to my sar > experience, and my argument is meaningless. rrdtool shouldn't trash the data > like sar. Or rrdtool consumes the data instantly, but the raw data is kept around for a bit. > I think you understand my paranoia, though, based upon your sar stats above. > >> The interface to sar is a handful of commands. It should not be hard >> to provide the same commands that get their data from whatever format >> is chosen as a successor. When support asks for sar data, you can >> give it to them. > > Having dealt with Sun many times, I am all too conscious of this not being > good enough for some engineers. They'll want the sar data, not data that's > the same as sar's. Yes, very stupid, but all too probable from my > experience. The time intervals have nothing to do with this. I was trying to suggest that in a future release of OpenSolaris, sar would simply be a presentation layer for the data collected by this new collection tool. If run in interactive mode, presumably that would trigger the appropriate resolution of data to be collected for the duration of interactive mode. 10 people running sar at the same time in interactive mode would not cause the system to sample once for each of them - it would cause the system to sample once (per interval) for all of them. Assuming they picked the same interval, they would all get the same results. This is not unlike how dtrace's tick probe synchronizes across all users of the same tick probe. > Don't get me wrong; I've dealt with some amazing engineers. But I've also > dealt with a few that seemed to have been hired right after they got their > MCSE. :-( > > And then there was the one from Texas who couldn't understand how I could > both live in the Mountain time zone and in Canada at the same time. Canada's just a little island off the coast of Mexico, right? It's certainly somewhere where they speak Spanish with a name like that... > >> And on the wishlist >> >> - A web service or similar that makes it possible to collect this data >> or do analysis that spans systems. >> - API's and tools that can talk to these web services and aid in data >> visualization. > > Excellent! +1 It's things like this that make me shy away from having plain text as the format. A web service causing frequent reparsing of a text file would be likely to start adding a fair amount of load to the system being monitored. > >> FWIW, I have repeatedly asked Sun to sell me Amber Road style >> analytics for use on Solaris boxes. I can't say that I have heard any >> indication that they will ever deliver such a thing. Perhaps this >> effort can be a step in that direction. > > I should take a look at Amber Road. Third-party software like this was never > an option at my old job. And right now it is only on OpenStorage. Quite a shame, really. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ sysadmin-discuss mailing list sysadmin-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/sysadmin-discuss