[twitter-dev] Re: Streaming API: Spritzer-stream coverage

Brendan O'Connor Wed, 27 May 2009 06:09:07 -0700

On Tue, May 26, 2009 at 10:07 PM, elversatile <elversat...@gmail.com> wrote:
>
> Makes sense. I was assuming the same. Thanks people! John from Twitter
> said that spritzer is 1/3 of the gardenhose, which makes it 15%. So I
> guess statistical insignificance of spritzer is due to its low
> percentage.


I'm also curious what "statistical insignificance" means in this
context, since in the Streaming API docs they're pretty assiduous
saying which are "significant" vs. "insignificant".  Sample sizes far
lower than 4% are of course fine for certain purposes as long as
they're drawn uniformly.  And even if not all that uniform, they might
still be good enough :)

There are so many different things to do with *hose/spritzer I'm not
sure what statistical significance means in the abstract.  I'm seeing
hundreds of thousands of messages per day on /spritzer.  If you're
interested in computing a statistic that holds across all tweets --
say, average tweet length -- that's *plenty*.  (Now, if you wanted to
compute the statistic per 1 minute time window and cared about
minute-per-minute differences, the story might be different...)

I'm curious to know what the docs author meant by "statistically
(in)significant" here.

Brendan
[ http://anyall.org ]

[twitter-dev] Re: Streaming API: Spritzer-stream coverage

Reply via email to