[twitter-dev] Re: new cursor-based pagination not multithread-friendly

David W. Fri, 18 Sep 2009 13:47:01 -0700

Hi Alan,

I originally thought this was a show-stopper too, but it can be worked
around by simply processing multiple accounts using those threads
rather than multiple pages of a single account.


Something like this:

    Have a producer that emits the account IDs requiring update onto a
queue, which is then consumed by your thread pool, with each thread
writing its 'page' to an intermediary scratch area associated with an
account, before emitting another work item onto the queue with the
next cursor ID, or if the next ID is null, initiating a 3rd process
that completes the task on a per-account basis once all pages have
been gathered. Repeat until queue is empty.

If you don't have multiple accounts to process, then I guess that
doesn't work. Note in the old scheme, your threads would have been
causing localized load spikes for Twitter anyway.


David

On Sep 18, 9:09 am, alan_b <ala...@gmail.com> wrote:
> when dealing with retrieving a large followers list from API, what i
> did was estimate the no. of pages i need (total / 5000) from the
> follower count of user's profile, and then send concurrent API
> requests to improve the speed.
>
> now with the new cursor-based pagination, this become impossible(it
> stills work, but i guess page-based pagination will be obsoleted
> someday?), because I don't know the next_cursor until I finish
> downloading a whole page. so i guess the page-based should be preserved
> (and improve)? rather than making it obsolete?

[twitter-dev] Re: new cursor-based pagination not multithread-friendly

Reply via email to