On Tue, May 26, 2009 at 10:07 PM, elversatile <elversat...@gmail.com> wrote: > > Makes sense. I was assuming the same. Thanks people! John from Twitter > said that spritzer is 1/3 of the gardenhose, which makes it 15%. So I > guess statistical insignificance of spritzer is due to its low > percentage.
I'm also curious what "statistical insignificance" means in this context, since in the Streaming API docs they're pretty assiduous saying which are "significant" vs. "insignificant". Sample sizes far lower than 4% are of course fine for certain purposes as long as they're drawn uniformly. And even if not all that uniform, they might still be good enough :) There are so many different things to do with *hose/spritzer I'm not sure what statistical significance means in the abstract. I'm seeing hundreds of thousands of messages per day on /spritzer. If you're interested in computing a statistic that holds across all tweets -- say, average tweet length -- that's *plenty*. (Now, if you wanted to compute the statistic per 1 minute time window and cared about minute-per-minute differences, the story might be different...) I'm curious to know what the docs author meant by "statistically (in)significant" here. Brendan [ http://anyall.org ]