Doesn't the streaming API have a sampling method status/sample for
statuses from which you can derive users? And don't the docs describe
this as random, while specifying gardenhose access is required for
statistically significant samples?
∞ Andy Badera
∞ +1 518-641-1280
∞ This email is: [ ]
The Streaming API sample method would provide a random sampling of
public users weighted by update rate, not a random sampling of all
users. The default 'spritzer' should be sufficient for most uses.
-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.
On Oct 12, 8:01 am, Andrew
I am doing some research using the Twitter API and I would like to get
a random sample of Twitter users. Any ideas of how this can be
accomplished?
Here's a start:
http://en.wikipedia.org/wiki/Sampling_(statistics)
At this point you are asking for a sampling method without providing an
That sample will be biased towards more active posters and may include
some demographic biases due to seasonal activities during the limited
time frame of the sample.
That answers my question, and that is what I was afraid of. I think
for my purposes (language detection), a random sample of
To clarify, does this mean that each (non-protected) user has an equal
probability of showing up in the stream regardless of how often they
tweet?
Nope. The stream is a sample of statuses as they are posted. Each
status has an equal probability of being selected. This isn't a user
sampling