I wanted to make a simple "cursor" which would allow me to remember a position on a timeline, and then pull messages and crawl forward without missing any messages. I thought the way to do that would be to use "since_id" and "count", however, this method is unreliable because of the way they work. It seems a lot of people are reporting issues having to do with since_id (perhaps this is related, not sure).
since_id will limit the returned results as will count. However, it appears that count works to preference the "most recent tweets". This makes sense if all you want is a snapshot of the most recent stuff. However, if you are trying to simply iterate over the list in order, then count can't be used. Here's an example to show why you can't build a reliable, simple, forward cursor using the twitter API: 1. Assume we are at since_id = 1000. This was the last (highest) message id we had previously seen, which we have saved. 2. There is a sudden spiked and 2000 tweets come in. 3. We now try to query with since_id=1000, count=200 (the max). Unfortunately, we have missed 1800 tweets, because we only get the most recent 200 tweets. The problem is actually worse than this simple example. Since the ids are non-sequential for a particular stream, we have no idea of how many tweets were actually for us. The ids are too far apart. We could set an id to the _lowest_ seen value and try iterating backward (fairly complicated for such a simple thing). Thus we are entirely dependent on the rate of tweets incoming and our sampling rate. At a certain rate it will appear to work, and at some point then start failing miserably. How to solve this: have some way of returning the earliest "count" tweets rather than the most recent. Let's call this query arg "early_count". This will easily allow a cursor to be created, and forward iteration on the stream of messages. Moving forward is simple, just remember the highest seen id, and pass this in as since_id, along with early_count set to however number of tweets you want to move forward by. I'm somewhat surprised no one has commented on this design flaw (which makes me suspicious, perhaps I missed something obvious). If so, apologies. Thanks, Zero