[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results

Waldron Faulkner Mon, 07 Sep 2009 21:14:21 -0700

I could really go for "jittery" right now... instead I'm getting
"totally broken"!


I'm getting two pages of results, using ?page=x, then empty. To me, it
looks like all my accounts have max 10K followers. I'd love some kind
of official response from Twitter on the status of paging (John?).

Example: user @starbucks has nearly 300K followers, however:
http://twitter.com/followers/ids.xml?id=30973&page=3
returns empty result.

- Waldron

On Sep 7, 10:24 pm, John Kalucki <jkalu...@gmail.com> wrote:
> This describes what I'd call row-based pagination. Cursor-based
> pagination does not suffer from the same jitter issues. A cursor-based
> approach returns a unique, within the total set, ordered opaque value
> that is indexed for constant time access. Removals in pages before or
> after do not affect the stability of next page retrieval.
>
> For a user with a large following, you'll never have a point-in-time
> snapshot of their followings with any approach, but you can retrieve a
> complete unique set of users that were followers throughout the
> duration of the query. Additions made while the query is running may
> or may not be returned, as chance allows.
>
> A row-based approach with OFFSET and LIMIT is doomed for reasons
> beyond correctness. The latency and CPU consumption, in MySQL at
> least, tends to O(N^2). The first few blocks aren't bad. The last few
> blocks for a 10M, or even 1M set are miserable.
>
> The jitter demonstrated by the current API is due to a minor and
> correctable design flaw in the allocation of the opaque cursor values.
> A fix is scheduled.
>
> -John Kaluckihttp://twitter.com/jkalucki
> Services, Twitter Inc.
>
> On Sep 6, 7:27 pm, Dewald Pretorius <dpr...@gmail.com> wrote:
>
> > There is no way that paging through a large and volatile data set can
> > ever return results that are 100% accurate.
>
> > Let's say one wants to page through @aplusk's followers list. That's
> > going to take between 3 and 5 minutes just to collect the follower ids
> > with &page (or the new cursors).
>
> > It is likely that some of the follower ids that you have gone past and
> > have already colledted, have unfollowed @aplusk while you are still
> > collecting the rest. I assume that the Twitter system does paging by
> > doing a standard SQL LIMIT clause. If you do LIMIT 1000000, 20 and
> > some of the ids that you have already paged past have been deleted,
> > the result set is going to "shift to the left" and you are going to
> > miss the ones that were above 1000000 but have subsequently moved left
> > to below 1000000.
>
> > There really are only two solutions to this problem:
>
> > a) we need to have the capability to reliably retrieve the entire
> > result set in one API call, or
>
> > b) everyone has to accept that the result set cannot be guaranteed to
> > be 100% accurate.
>
> > Dewald

[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results

Reply via email to