I could really go for "jittery" right now... instead I'm getting "totally broken"!
I'm getting two pages of results, using ?page=x, then empty. To me, it looks like all my accounts have max 10K followers. I'd love some kind of official response from Twitter on the status of paging (John?). Example: user @starbucks has nearly 300K followers, however: http://twitter.com/followers/ids.xml?id=30973&page=3 returns empty result. - Waldron On Sep 7, 10:24 pm, John Kalucki <jkalu...@gmail.com> wrote: > This describes what I'd call row-based pagination. Cursor-based > pagination does not suffer from the same jitter issues. A cursor-based > approach returns a unique, within the total set, ordered opaque value > that is indexed for constant time access. Removals in pages before or > after do not affect the stability of next page retrieval. > > For a user with a large following, you'll never have a point-in-time > snapshot of their followings with any approach, but you can retrieve a > complete unique set of users that were followers throughout the > duration of the query. Additions made while the query is running may > or may not be returned, as chance allows. > > A row-based approach with OFFSET and LIMIT is doomed for reasons > beyond correctness. The latency and CPU consumption, in MySQL at > least, tends to O(N^2). The first few blocks aren't bad. The last few > blocks for a 10M, or even 1M set are miserable. > > The jitter demonstrated by the current API is due to a minor and > correctable design flaw in the allocation of the opaque cursor values. > A fix is scheduled. > > -John Kaluckihttp://twitter.com/jkalucki > Services, Twitter Inc. > > On Sep 6, 7:27 pm, Dewald Pretorius <dpr...@gmail.com> wrote: > > > There is no way that paging through a large and volatile data set can > > ever return results that are 100% accurate. > > > Let's say one wants to page through @aplusk's followers list. That's > > going to take between 3 and 5 minutes just to collect the follower ids > > with &page (or the new cursors). > > > It is likely that some of the follower ids that you have gone past and > > have already colledted, have unfollowed @aplusk while you are still > > collecting the rest. I assume that the Twitter system does paging by > > doing a standard SQL LIMIT clause. If you do LIMIT 1000000, 20 and > > some of the ids that you have already paged past have been deleted, > > the result set is going to "shift to the left" and you are going to > > miss the ones that were above 1000000 but have subsequently moved left > > to below 1000000. > > > There really are only two solutions to this problem: > > > a) we need to have the capability to reliably retrieve the entire > > result set in one API call, or > > > b) everyone has to accept that the result set cannot be guaranteed to > > be 100% accurate. > > > Dewald