Some quick benchmarks...

Grabbed entire social graph for ~250 users, where each user has a
number of friends/followers between 0 and 80,000.  I randomly used
both the cursor and cursor-less API methods.

< 5000 ids
cursor: 0.72 avg seconds
cursorless: 0.51 avg seconds

5000 to 10,000 ids
cursor: 1.42 avg seconds
cursorless: 0.94 avg seconds

1 to 80,000 ids
cursor: 2.82 avg seconds
cursorless: 1.21 avg seconds

5,000 to 80,000 ids
cursor: 4.28
cursorless: 1.59

10,000 to 80,000 ids
cursor: 5.23
cursorless: 1.82

20,000 to 80,000 ids
cursor: 6.82
cursorless: 2

40,000 to 80,000 ids
cursor: 9.5
cursorless: 3

60,000 to 80,000 ids
cursor: 12.25
cursorless: 3.12

On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote:
> Ditto PJB :-)
>
> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote:
>
> > I think that's like asking someone: why do you eat food? But don't say
> > because it tastes good or nourishes you, because we already know
> > that! ;)
>
> > You guys presumably set the 5000 ids per cursor limit by analyzing
> > your user base and noting that one could still obtain the social graph
> > for the vast majority of users with a single call.
>
> > But this is a bit misleading.  For analytics-based apps, who aim to do
> > near real-time analysis of relationships, the focus is typically on
> > consumer brands who have a far larger than average number of
> > relationships (e.g., 50k - 200k).
>
> > This means that those apps are neck-deep in cursor-based stuff, and
> > quickly realize the existing drawbacks, including, in order of
> > significance:
>
> > - Latency.  Fetching ids for a user with 3000 friends is comparable
> > between the two calls.  But as you increment past 5000, the speed
> > quickly peaks at a 5+x difference (I will include more benchmarks in a
> > short while).  For example, fetching 80,000 friends via the get-all
> > method takes on average 3 seconds; it takes, on average, 15 seconds
> > with cursors.
>
> > - Code complexity & elegance.  I would say that there is a 3x increase
> > in code lines to account for cursors, from retrying failed cursors, to
> > caching to account for cursor slowness, to UI changes to coddle
> > impatient users.
>
> > - Incomprehensibility.  While there are obviously very good reasons
> > from Twitter's perspective (performance) to the cursor based model,
> > there really is no apparent obvious benefit to API users for the ids
> > calls.  I would make the case that a large majority of API uses of the
> > ids calls need and require the entire social graph, not an incomplete
> > one.  After all, we need to know what new relationships exist, but
> > also what old relationships have failed.  To dole out the data in
> > drips and drabs is like serving a pint of beer in sippy cups.  That is
> > to say: most users need the entire social graph, so what is the use
> > case, from an API user's perspective, of NOT maintaining at least one
> > means to quickly, reliably, and efficiently get it in a single call?
>
> > - API Barriers to entry.  Most of the aforementioned arguments are
> > obviously from an API user's perspective, but there's something, too,
> > for Twitter to consider.  Namely, by increasing the complexity and
> > learning curve of particular API actions, you presumably further limit
> > the pool of developers who will engage with that API.  That's probably
> > a bad thing.
>
> > - Limits Twitter 2.0 app development.  This, again, speaks to issues
> > bearing on speed and complexity, but I think it is important.  The
> > first few apps in any given media or innovation invariably have to do
> > with basic functionality building blocks -- tweeting, following,
> > showing tweets.  But the next wave almost always has to do with
> > measurement and analysis.  By making such analysis more difficult, you
> > forestall the critically important ability for brands, and others, to
> > measure performance.
>
> > - API users have requested it.  Shouldn't, ultimately, the use case
> > for a particular API method simply be the fact that a number of API
> > developers have requested that it remain?
>
> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote:
> > > Can everyone contribute their use case for this API method? I'm trying
> > > to fully understand the deficiencies of the cursor approach.
>
> > > Please don't include that cursors are slow or that they are charged
> > > against the rate limit, as those are known issues.
>
> > > Thanks.
>
>

Reply via email to