Also, how do we get a "business relationship" set up?  I've been asking for
that for years now.

Jesse

On Mon, Jan 4, 2010 at 10:16 PM, Jesse Stay <jesses...@gmail.com> wrote:

> John, how are things going on the real-time social graph APIs?  That would
> solve a lot of things for me surrounding this.
>
> Jesse
>
>
> On Mon, Jan 4, 2010 at 9:58 PM, John Kalucki <j...@twitter.com> wrote:
>
>> The backend datastore returns following blocks in constant time,
>> regardless of the cursor depth. When I test a user with 100k+
>> followers via twitter.com using a ruby script, I see each cursored
>> block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds,
>> median 1.47 sec, stddev of .377, (home DSL, shared by several people
>> at the moment). So, it seems that we're returning the data over home
>> DSL at between 2,500 and 4,000 ids per second, which seems like a
>> perfectly reasonable rate and variance.
>>
>> If I recall correctly, the "cursorless" methods are just shunted to
>> the first block each time, and thus represent a constant, incomplete,
>> amount of data...
>>
>> Looking into my crystal ball, if you want a lot more than several
>> thousand widgets per second from Twitter, you probably aren't going to
>> get them via REST, and you will probably have some sort of "business
>> relationship" in place with Twitter.
>>
>> -John Kalucki
>> http://twitter.com/jkalucki
>> Services, Twitter Inc.
>>
>> (A slice of data below)
>>
>> url /followers/ids/alexa_chung.xml?cursor=-1
>> fetch time = 1.478542
>> url /followers/ids/alexa_chung.xml?cursor=1322524362256299608
>> fetch time = 2.044831
>> url /followers/ids/alexa_chung.xml?cursor=1321126009663170021
>> fetch time = 1.350035
>> url /followers/ids/alexa_chung.xml?cursor=1319359640017038524
>> fetch time = 1.44636
>> url /followers/ids/alexa_chung.xml?cursor=1317653620096535558
>> fetch time = 1.955163
>> url /followers/ids/alexa_chung.xml?cursor=1316184964685221966
>> fetch time = 1.326226
>> url /followers/ids/alexa_chung.xml?cursor=1314866514116423204
>> fetch time = 1.96824
>> url /followers/ids/alexa_chung.xml?cursor=1313551933690106944
>> fetch time = 1.513922
>> url /followers/ids/alexa_chung.xml?cursor=1312201296962214944
>> fetch time = 1.59179
>> url /followers/ids/alexa_chung.xml?cursor=1311363260604388613
>> fetch time = 2.259924
>> url /followers/ids/alexa_chung.xml?cursor=1310627455188010229
>> fetch time = 1.706438
>> url /followers/ids/alexa_chung.xml?cursor=1309772694575801646
>> fetch time = 1.460413
>>
>>
>>
>> On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote:
>> >
>> > Some quick benchmarks...
>> >
>> > Grabbed entire social graph for ~250 users, where each user has a
>> > number of friends/followers between 0 and 80,000.  I randomly used
>> > both the cursor and cursor-less API methods.
>> >
>> > < 5000 ids
>> > cursor: 0.72 avg seconds
>> > cursorless: 0.51 avg seconds
>> >
>> > 5000 to 10,000 ids
>> > cursor: 1.42 avg seconds
>> > cursorless: 0.94 avg seconds
>> >
>> > 1 to 80,000 ids
>> > cursor: 2.82 avg seconds
>> > cursorless: 1.21 avg seconds
>> >
>> > 5,000 to 80,000 ids
>> > cursor: 4.28
>> > cursorless: 1.59
>> >
>> > 10,000 to 80,000 ids
>> > cursor: 5.23
>> > cursorless: 1.82
>> >
>> > 20,000 to 80,000 ids
>> > cursor: 6.82
>> > cursorless: 2
>> >
>> > 40,000 to 80,000 ids
>> > cursor: 9.5
>> > cursorless: 3
>> >
>> > 60,000 to 80,000 ids
>> > cursor: 12.25
>> > cursorless: 3.12
>> >
>> > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote:
>> >> Ditto PJB :-)
>> >>
>> >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote:
>> >>
>> >> > I think that's like asking someone: why do you eat food? But don't
>> say
>> >> > because it tastes good or nourishes you, because we already know
>> >> > that! ;)
>> >>
>> >> > You guys presumably set the 5000 ids per cursor limit by analyzing
>> >> > your user base and noting that one could still obtain the social
>> graph
>> >> > for the vast majority of users with a single call.
>> >>
>> >> > But this is a bit misleading.  For analytics-based apps, who aim to
>> do
>> >> > near real-time analysis of relationships, the focus is typically on
>> >> > consumer brands who have a far larger than average number of
>> >> > relationships (e.g., 50k - 200k).
>> >>
>> >> > This means that those apps are neck-deep in cursor-based stuff, and
>> >> > quickly realize the existing drawbacks, including, in order of
>> >> > significance:
>> >>
>> >> > - Latency.  Fetching ids for a user with 3000 friends is comparable
>> >> > between the two calls.  But as you increment past 5000, the speed
>> >> > quickly peaks at a 5+x difference (I will include more benchmarks in
>> a
>> >> > short while).  For example, fetching 80,000 friends via the get-all
>> >> > method takes on average 3 seconds; it takes, on average, 15 seconds
>> >> > with cursors.
>> >>
>> >> > - Code complexity & elegance.  I would say that there is a 3x
>> increase
>> >> > in code lines to account for cursors, from retrying failed cursors,
>> to
>> >> > caching to account for cursor slowness, to UI changes to coddle
>> >> > impatient users.
>> >>
>> >> > - Incomprehensibility.  While there are obviously very good reasons
>> >> > from Twitter's perspective (performance) to the cursor based model,
>> >> > there really is no apparent obvious benefit to API users for the ids
>> >> > calls.  I would make the case that a large majority of API uses of
>> the
>> >> > ids calls need and require the entire social graph, not an incomplete
>> >> > one.  After all, we need to know what new relationships exist, but
>> >> > also what old relationships have failed.  To dole out the data in
>> >> > drips and drabs is like serving a pint of beer in sippy cups.  That
>> is
>> >> > to say: most users need the entire social graph, so what is the use
>> >> > case, from an API user's perspective, of NOT maintaining at least one
>> >> > means to quickly, reliably, and efficiently get it in a single call?
>> >>
>> >> > - API Barriers to entry.  Most of the aforementioned arguments are
>> >> > obviously from an API user's perspective, but there's something, too,
>> >> > for Twitter to consider.  Namely, by increasing the complexity and
>> >> > learning curve of particular API actions, you presumably further
>> limit
>> >> > the pool of developers who will engage with that API.  That's
>> probably
>> >> > a bad thing.
>> >>
>> >> > - Limits Twitter 2.0 app development.  This, again, speaks to issues
>> >> > bearing on speed and complexity, but I think it is important.  The
>> >> > first few apps in any given media or innovation invariably have to do
>> >> > with basic functionality building blocks -- tweeting, following,
>> >> > showing tweets.  But the next wave almost always has to do with
>> >> > measurement and analysis.  By making such analysis more difficult,
>> you
>> >> > forestall the critically important ability for brands, and others, to
>> >> > measure performance.
>> >>
>> >> > - API users have requested it.  Shouldn't, ultimately, the use case
>> >> > for a particular API method simply be the fact that a number of API
>> >> > developers have requested that it remain?
>> >>
>> >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote:
>> >> > > Can everyone contribute their use case for this API method? I'm
>> trying
>> >> > > to fully understand the deficiencies of the cursor approach.
>> >>
>> >> > > Please don't include that cursors are slow or that they are charged
>> >> > > against the rate limit, as those are known issues.
>> >>
>> >> > > Thanks.
>> >>
>> >>
>> >
>>
>
>

Reply via email to