2010

John Kalucki Mon, 04 Jan 2010 21:57:29 -0800

Ryan Sarver announced that we're going to provide an agreement
framework for Tweet data at Le Web last month. Until all that
licensing machinery is working well, we probably won't put any effort
into syndicating the social graph. At this point, social graph
syndication appears to be totally unformed, completely up in the air,
and any predictions would be unwise.


-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Mon, Jan 4, 2010 at 9:16 PM, Jesse Stay <jesses...@gmail.com> wrote:
> John, how are things going on the real-time social graph APIs?  That would
> solve a lot of things for me surrounding this.
> Jesse
>
> On Mon, Jan 4, 2010 at 9:58 PM, John Kalucki <j...@twitter.com> wrote:
>>
>> The backend datastore returns following blocks in constant time,
>> regardless of the cursor depth. When I test a user with 100k+
>> followers via twitter.com using a ruby script, I see each cursored
>> block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds,
>> median 1.47 sec, stddev of .377, (home DSL, shared by several people
>> at the moment). So, it seems that we're returning the data over home
>> DSL at between 2,500 and 4,000 ids per second, which seems like a
>> perfectly reasonable rate and variance.
>>
>> If I recall correctly, the "cursorless" methods are just shunted to
>> the first block each time, and thus represent a constant, incomplete,
>> amount of data...
>>
>> Looking into my crystal ball, if you want a lot more than several
>> thousand widgets per second from Twitter, you probably aren't going to
>> get them via REST, and you will probably have some sort of "business
>> relationship" in place with Twitter.
>>
>> -John Kalucki
>> http://twitter.com/jkalucki
>> Services, Twitter Inc.
>>
>> (A slice of data below)
>>
>> url /followers/ids/alexa_chung.xml?cursor=-1
>> fetch time = 1.478542
>> url /followers/ids/alexa_chung.xml?cursor=1322524362256299608
>> fetch time = 2.044831
>> url /followers/ids/alexa_chung.xml?cursor=1321126009663170021
>> fetch time = 1.350035
>> url /followers/ids/alexa_chung.xml?cursor=1319359640017038524
>> fetch time = 1.44636
>> url /followers/ids/alexa_chung.xml?cursor=1317653620096535558
>> fetch time = 1.955163
>> url /followers/ids/alexa_chung.xml?cursor=1316184964685221966
>> fetch time = 1.326226
>> url /followers/ids/alexa_chung.xml?cursor=1314866514116423204
>> fetch time = 1.96824
>> url /followers/ids/alexa_chung.xml?cursor=1313551933690106944
>> fetch time = 1.513922
>> url /followers/ids/alexa_chung.xml?cursor=1312201296962214944
>> fetch time = 1.59179
>> url /followers/ids/alexa_chung.xml?cursor=1311363260604388613
>> fetch time = 2.259924
>> url /followers/ids/alexa_chung.xml?cursor=1310627455188010229
>> fetch time = 1.706438
>> url /followers/ids/alexa_chung.xml?cursor=1309772694575801646
>> fetch time = 1.460413
>>
>>
>>
>> On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote:
>> >
>> > Some quick benchmarks...
>> >
>> > Grabbed entire social graph for ~250 users, where each user has a
>> > number of friends/followers between 0 and 80,000.  I randomly used
>> > both the cursor and cursor-less API methods.
>> >
>> > < 5000 ids
>> > cursor: 0.72 avg seconds
>> > cursorless: 0.51 avg seconds
>> >
>> > 5000 to 10,000 ids
>> > cursor: 1.42 avg seconds
>> > cursorless: 0.94 avg seconds
>> >
>> > 1 to 80,000 ids
>> > cursor: 2.82 avg seconds
>> > cursorless: 1.21 avg seconds
>> >
>> > 5,000 to 80,000 ids
>> > cursor: 4.28
>> > cursorless: 1.59
>> >
>> > 10,000 to 80,000 ids
>> > cursor: 5.23
>> > cursorless: 1.82
>> >
>> > 20,000 to 80,000 ids
>> > cursor: 6.82
>> > cursorless: 2
>> >
>> > 40,000 to 80,000 ids
>> > cursor: 9.5
>> > cursorless: 3
>> >
>> > 60,000 to 80,000 ids
>> > cursor: 12.25
>> > cursorless: 3.12
>> >
>> > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote:
>> >> Ditto PJB :-)
>> >>
>> >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote:
>> >>
>> >> > I think that's like asking someone: why do you eat food? But don't
>> >> > say
>> >> > because it tastes good or nourishes you, because we already know
>> >> > that! ;)
>> >>
>> >> > You guys presumably set the 5000 ids per cursor limit by analyzing
>> >> > your user base and noting that one could still obtain the social
>> >> > graph
>> >> > for the vast majority of users with a single call.
>> >>
>> >> > But this is a bit misleading.  For analytics-based apps, who aim to
>> >> > do
>> >> > near real-time analysis of relationships, the focus is typically on
>> >> > consumer brands who have a far larger than average number of
>> >> > relationships (e.g., 50k - 200k).
>> >>
>> >> > This means that those apps are neck-deep in cursor-based stuff, and
>> >> > quickly realize the existing drawbacks, including, in order of
>> >> > significance:
>> >>
>> >> > - Latency.  Fetching ids for a user with 3000 friends is comparable
>> >> > between the two calls.  But as you increment past 5000, the speed
>> >> > quickly peaks at a 5+x difference (I will include more benchmarks in
>> >> > a
>> >> > short while).  For example, fetching 80,000 friends via the get-all
>> >> > method takes on average 3 seconds; it takes, on average, 15 seconds
>> >> > with cursors.
>> >>
>> >> > - Code complexity & elegance.  I would say that there is a 3x
>> >> > increase
>> >> > in code lines to account for cursors, from retrying failed cursors,
>> >> > to
>> >> > caching to account for cursor slowness, to UI changes to coddle
>> >> > impatient users.
>> >>
>> >> > - Incomprehensibility.  While there are obviously very good reasons
>> >> > from Twitter's perspective (performance) to the cursor based model,
>> >> > there really is no apparent obvious benefit to API users for the ids
>> >> > calls.  I would make the case that a large majority of API uses of
>> >> > the
>> >> > ids calls need and require the entire social graph, not an incomplete
>> >> > one.  After all, we need to know what new relationships exist, but
>> >> > also what old relationships have failed.  To dole out the data in
>> >> > drips and drabs is like serving a pint of beer in sippy cups.  That
>> >> > is
>> >> > to say: most users need the entire social graph, so what is the use
>> >> > case, from an API user's perspective, of NOT maintaining at least one
>> >> > means to quickly, reliably, and efficiently get it in a single call?
>> >>
>> >> > - API Barriers to entry.  Most of the aforementioned arguments are
>> >> > obviously from an API user's perspective, but there's something, too,
>> >> > for Twitter to consider.  Namely, by increasing the complexity and
>> >> > learning curve of particular API actions, you presumably further
>> >> > limit
>> >> > the pool of developers who will engage with that API.  That's
>> >> > probably
>> >> > a bad thing.
>> >>
>> >> > - Limits Twitter 2.0 app development.  This, again, speaks to issues
>> >> > bearing on speed and complexity, but I think it is important.  The
>> >> > first few apps in any given media or innovation invariably have to do
>> >> > with basic functionality building blocks -- tweeting, following,
>> >> > showing tweets.  But the next wave almost always has to do with
>> >> > measurement and analysis.  By making such analysis more difficult,
>> >> > you
>> >> > forestall the critically important ability for brands, and others, to
>> >> > measure performance.
>> >>
>> >> > - API users have requested it.  Shouldn't, ultimately, the use case
>> >> > for a particular API method simply be the fact that a number of API
>> >> > developers have requested that it remain?
>> >>
>> >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote:
>> >> > > Can everyone contribute their use case for this API method? I'm
>> >> > > trying
>> >> > > to fully understand the deficiencies of the cursor approach.
>> >>
>> >> > > Please don't include that cursors are slow or that they are charged
>> >> > > against the rate limit, as those are known issues.
>> >>
>> >> > > Thanks.
>> >>
>> >>
>> >
>
>

Re: [twitter-dev] Re: Social Graph API: Legacy data format will be eliminated 1/11/2010

Reply via email to