So my project is a sort of tweetmeme or twitturly type thing where I'm
looking to collect a sample of the links being shared through
Twitter.  Unlike those projects I don't have a firehose so I have to
rely on search.  Fortunatly, I don't really need to see every link for
my project just a representive sample.

The actual query I'm using is "http OR www filter:links" where the
"filter:links" constraint helps make sure I exclude tweets like "can't
get http GET to work"  I don't really care about those.

Agreed with this query being a high volume query so maybe it'll never
be in sync but that's ok... Now I'm just ignoring the dupes.  And to
be clear, I have no intention of trying to keep up and use search as a
poor mans firehose.  What ever rate you guys are comfortable with me
hitting you at is what I'll do.  If that's one request/minute so be
it.  Just wanted to get the pagenation working so that I could better
control things and that's when I noticed the dupes.

-steve
(Microsoft Research)

Reply via email to