2010

John Kalucki Mon, 04 Jan 2010 20:58:21 -0800

The backend datastore returns following blocks in constant time,
regardless of the cursor depth. When I test a user with 100k+
followers via twitter.com using a ruby script, I see each cursored
block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds,
median 1.47 sec, stddev of .377, (home DSL, shared by several people
at the moment). So, it seems that we're returning the data over home
DSL at between 2,500 and 4,000 ids per second, which seems like a
perfectly reasonable rate and variance.


If I recall correctly, the "cursorless" methods are just shunted to
the first block each time, and thus represent a constant, incomplete,
amount of data...

Looking into my crystal ball, if you want a lot more than several
thousand widgets per second from Twitter, you probably aren't going to
get them via REST, and you will probably have some sort of "business
relationship" in place with Twitter.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

(A slice of data below)

url /followers/ids/alexa_chung.xml?cursor=-1
fetch time = 1.478542
url /followers/ids/alexa_chung.xml?cursor=1322524362256299608
fetch time = 2.044831
url /followers/ids/alexa_chung.xml?cursor=1321126009663170021
fetch time = 1.350035
url /followers/ids/alexa_chung.xml?cursor=1319359640017038524
fetch time = 1.44636
url /followers/ids/alexa_chung.xml?cursor=1317653620096535558
fetch time = 1.955163
url /followers/ids/alexa_chung.xml?cursor=1316184964685221966
fetch time = 1.326226
url /followers/ids/alexa_chung.xml?cursor=1314866514116423204
fetch time = 1.96824
url /followers/ids/alexa_chung.xml?cursor=1313551933690106944
fetch time = 1.513922
url /followers/ids/alexa_chung.xml?cursor=1312201296962214944
fetch time = 1.59179
url /followers/ids/alexa_chung.xml?cursor=1311363260604388613
fetch time = 2.259924
url /followers/ids/alexa_chung.xml?cursor=1310627455188010229
fetch time = 1.706438
url /followers/ids/alexa_chung.xml?cursor=1309772694575801646
fetch time = 1.460413



On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote:
>
> Some quick benchmarks...
>
> Grabbed entire social graph for ~250 users, where each user has a
> number of friends/followers between 0 and 80,000.  I randomly used
> both the cursor and cursor-less API methods.
>
> < 5000 ids
> cursor: 0.72 avg seconds
> cursorless: 0.51 avg seconds
>
> 5000 to 10,000 ids
> cursor: 1.42 avg seconds
> cursorless: 0.94 avg seconds
>
> 1 to 80,000 ids
> cursor: 2.82 avg seconds
> cursorless: 1.21 avg seconds
>
> 5,000 to 80,000 ids
> cursor: 4.28
> cursorless: 1.59
>
> 10,000 to 80,000 ids
> cursor: 5.23
> cursorless: 1.82
>
> 20,000 to 80,000 ids
> cursor: 6.82
> cursorless: 2
>
> 40,000 to 80,000 ids
> cursor: 9.5
> cursorless: 3
>
> 60,000 to 80,000 ids
> cursor: 12.25
> cursorless: 3.12
>
> On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote:
>> Ditto PJB :-)
>>
>> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote:
>>
>> > I think that's like asking someone: why do you eat food? But don't say
>> > because it tastes good or nourishes you, because we already know
>> > that! ;)
>>
>> > You guys presumably set the 5000 ids per cursor limit by analyzing
>> > your user base and noting that one could still obtain the social graph
>> > for the vast majority of users with a single call.
>>
>> > But this is a bit misleading.  For analytics-based apps, who aim to do
>> > near real-time analysis of relationships, the focus is typically on
>> > consumer brands who have a far larger than average number of
>> > relationships (e.g., 50k - 200k).
>>
>> > This means that those apps are neck-deep in cursor-based stuff, and
>> > quickly realize the existing drawbacks, including, in order of
>> > significance:
>>
>> > - Latency.  Fetching ids for a user with 3000 friends is comparable
>> > between the two calls.  But as you increment past 5000, the speed
>> > quickly peaks at a 5+x difference (I will include more benchmarks in a
>> > short while).  For example, fetching 80,000 friends via the get-all
>> > method takes on average 3 seconds; it takes, on average, 15 seconds
>> > with cursors.
>>
>> > - Code complexity & elegance.  I would say that there is a 3x increase
>> > in code lines to account for cursors, from retrying failed cursors, to
>> > caching to account for cursor slowness, to UI changes to coddle
>> > impatient users.
>>
>> > - Incomprehensibility.  While there are obviously very good reasons
>> > from Twitter's perspective (performance) to the cursor based model,
>> > there really is no apparent obvious benefit to API users for the ids
>> > calls.  I would make the case that a large majority of API uses of the
>> > ids calls need and require the entire social graph, not an incomplete
>> > one.  After all, we need to know what new relationships exist, but
>> > also what old relationships have failed.  To dole out the data in
>> > drips and drabs is like serving a pint of beer in sippy cups.  That is
>> > to say: most users need the entire social graph, so what is the use
>> > case, from an API user's perspective, of NOT maintaining at least one
>> > means to quickly, reliably, and efficiently get it in a single call?
>>
>> > - API Barriers to entry.  Most of the aforementioned arguments are
>> > obviously from an API user's perspective, but there's something, too,
>> > for Twitter to consider.  Namely, by increasing the complexity and
>> > learning curve of particular API actions, you presumably further limit
>> > the pool of developers who will engage with that API.  That's probably
>> > a bad thing.
>>
>> > - Limits Twitter 2.0 app development.  This, again, speaks to issues
>> > bearing on speed and complexity, but I think it is important.  The
>> > first few apps in any given media or innovation invariably have to do
>> > with basic functionality building blocks -- tweeting, following,
>> > showing tweets.  But the next wave almost always has to do with
>> > measurement and analysis.  By making such analysis more difficult, you
>> > forestall the critically important ability for brands, and others, to
>> > measure performance.
>>
>> > - API users have requested it.  Shouldn't, ultimately, the use case
>> > for a particular API method simply be the fact that a number of API
>> > developers have requested that it remain?
>>
>> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote:
>> > > Can everyone contribute their use case for this API method? I'm trying
>> > > to fully understand the deficiencies of the cursor approach.
>>
>> > > Please don't include that cursors are slow or that they are charged
>> > > against the rate limit, as those are known issues.
>>
>> > > Thanks.
>>
>>
>

Re: [twitter-dev] Re: Social Graph API: Legacy data format will be eliminated 1/11/2010

Reply via email to