I've been trying to write a script to use the max_id parameter to loop through all 15 pages of results (with 100 results per page) without getting in troubles with grabbing the same tweet multiple times.
Every time I do so, I find that not only are there a couple of duplicates on page 1 and 2, but also that the last tweet on page 1 is well further into the future, and has a lower ID, than a bunch of tweets on page 2. For example, consider these two, both with the same max_id but page = 1 and page = 2 respectively: http://search.twitter.com/search?rpp=100&page=1&geocode=-40.900557,174.885971,1000km&max_id=5379894247 http://search.twitter.com/search?rpp=100&page=2&geocode=-40.900557,174.885971,1000km&max_id=5379894247 (Or if you prefer json links which are what I am actually using, but I see the same thing on the above ones which are easier to describe: http://search.twitter.com/search.json?q=&rpp=100&geocode=-40.900557,174.885971,1000km&page=1&max_id=5379894247 Request: http://search.twitter.com/search.json?q=&rpp=100&geocode=-40.900557,174.885971,1000km&page=2&max_id=5379894247) The first result on page 2 above was posted about 4 hours before the last tweet on page 1. There are also duplicates, eg AshleyGray00: Fireworks! I've been trying to figure this bug out for a while as I'm sure I'm missing something obvious but I'm completely stumped. Does anyone have any clue what is going on here? The only other threads I have found are about people trying to combine since_id and max_id which I know is not allowed, so I can't find anyone else having similar problems.