So my project is a sort of tweetmeme or twitturly type thing where I'm looking to collect a sample of the links being shared through Twitter. Unlike those projects I don't have a firehose so I have to rely on search. Fortunatly, I don't really need to see every link for my project just a representive sample.
The actual query I'm using is "http OR www filter:links" where the "filter:links" constraint helps make sure I exclude tweets like "can't get http GET to work" I don't really care about those. Agreed with this query being a high volume query so maybe it'll never be in sync but that's ok... Now I'm just ignoring the dupes. And to be clear, I have no intention of trying to keep up and use search as a poor mans firehose. What ever rate you guys are comfortable with me hitting you at is what I'll do. If that's one request/minute so be it. Just wanted to get the pagenation working so that I could better control things and that's when I noticed the dupes. -steve (Microsoft Research)