Also, how do we get a "business relationship" set up? I've been asking for that for years now.
Jesse On Mon, Jan 4, 2010 at 10:16 PM, Jesse Stay <jesses...@gmail.com> wrote: > John, how are things going on the real-time social graph APIs? That would > solve a lot of things for me surrounding this. > > Jesse > > > On Mon, Jan 4, 2010 at 9:58 PM, John Kalucki <j...@twitter.com> wrote: > >> The backend datastore returns following blocks in constant time, >> regardless of the cursor depth. When I test a user with 100k+ >> followers via twitter.com using a ruby script, I see each cursored >> block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds, >> median 1.47 sec, stddev of .377, (home DSL, shared by several people >> at the moment). So, it seems that we're returning the data over home >> DSL at between 2,500 and 4,000 ids per second, which seems like a >> perfectly reasonable rate and variance. >> >> If I recall correctly, the "cursorless" methods are just shunted to >> the first block each time, and thus represent a constant, incomplete, >> amount of data... >> >> Looking into my crystal ball, if you want a lot more than several >> thousand widgets per second from Twitter, you probably aren't going to >> get them via REST, and you will probably have some sort of "business >> relationship" in place with Twitter. >> >> -John Kalucki >> http://twitter.com/jkalucki >> Services, Twitter Inc. >> >> (A slice of data below) >> >> url /followers/ids/alexa_chung.xml?cursor=-1 >> fetch time = 1.478542 >> url /followers/ids/alexa_chung.xml?cursor=1322524362256299608 >> fetch time = 2.044831 >> url /followers/ids/alexa_chung.xml?cursor=1321126009663170021 >> fetch time = 1.350035 >> url /followers/ids/alexa_chung.xml?cursor=1319359640017038524 >> fetch time = 1.44636 >> url /followers/ids/alexa_chung.xml?cursor=1317653620096535558 >> fetch time = 1.955163 >> url /followers/ids/alexa_chung.xml?cursor=1316184964685221966 >> fetch time = 1.326226 >> url /followers/ids/alexa_chung.xml?cursor=1314866514116423204 >> fetch time = 1.96824 >> url /followers/ids/alexa_chung.xml?cursor=1313551933690106944 >> fetch time = 1.513922 >> url /followers/ids/alexa_chung.xml?cursor=1312201296962214944 >> fetch time = 1.59179 >> url /followers/ids/alexa_chung.xml?cursor=1311363260604388613 >> fetch time = 2.259924 >> url /followers/ids/alexa_chung.xml?cursor=1310627455188010229 >> fetch time = 1.706438 >> url /followers/ids/alexa_chung.xml?cursor=1309772694575801646 >> fetch time = 1.460413 >> >> >> >> On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote: >> > >> > Some quick benchmarks... >> > >> > Grabbed entire social graph for ~250 users, where each user has a >> > number of friends/followers between 0 and 80,000. I randomly used >> > both the cursor and cursor-less API methods. >> > >> > < 5000 ids >> > cursor: 0.72 avg seconds >> > cursorless: 0.51 avg seconds >> > >> > 5000 to 10,000 ids >> > cursor: 1.42 avg seconds >> > cursorless: 0.94 avg seconds >> > >> > 1 to 80,000 ids >> > cursor: 2.82 avg seconds >> > cursorless: 1.21 avg seconds >> > >> > 5,000 to 80,000 ids >> > cursor: 4.28 >> > cursorless: 1.59 >> > >> > 10,000 to 80,000 ids >> > cursor: 5.23 >> > cursorless: 1.82 >> > >> > 20,000 to 80,000 ids >> > cursor: 6.82 >> > cursorless: 2 >> > >> > 40,000 to 80,000 ids >> > cursor: 9.5 >> > cursorless: 3 >> > >> > 60,000 to 80,000 ids >> > cursor: 12.25 >> > cursorless: 3.12 >> > >> > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote: >> >> Ditto PJB :-) >> >> >> >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote: >> >> >> >> > I think that's like asking someone: why do you eat food? But don't >> say >> >> > because it tastes good or nourishes you, because we already know >> >> > that! ;) >> >> >> >> > You guys presumably set the 5000 ids per cursor limit by analyzing >> >> > your user base and noting that one could still obtain the social >> graph >> >> > for the vast majority of users with a single call. >> >> >> >> > But this is a bit misleading. For analytics-based apps, who aim to >> do >> >> > near real-time analysis of relationships, the focus is typically on >> >> > consumer brands who have a far larger than average number of >> >> > relationships (e.g., 50k - 200k). >> >> >> >> > This means that those apps are neck-deep in cursor-based stuff, and >> >> > quickly realize the existing drawbacks, including, in order of >> >> > significance: >> >> >> >> > - Latency. Fetching ids for a user with 3000 friends is comparable >> >> > between the two calls. But as you increment past 5000, the speed >> >> > quickly peaks at a 5+x difference (I will include more benchmarks in >> a >> >> > short while). For example, fetching 80,000 friends via the get-all >> >> > method takes on average 3 seconds; it takes, on average, 15 seconds >> >> > with cursors. >> >> >> >> > - Code complexity & elegance. I would say that there is a 3x >> increase >> >> > in code lines to account for cursors, from retrying failed cursors, >> to >> >> > caching to account for cursor slowness, to UI changes to coddle >> >> > impatient users. >> >> >> >> > - Incomprehensibility. While there are obviously very good reasons >> >> > from Twitter's perspective (performance) to the cursor based model, >> >> > there really is no apparent obvious benefit to API users for the ids >> >> > calls. I would make the case that a large majority of API uses of >> the >> >> > ids calls need and require the entire social graph, not an incomplete >> >> > one. After all, we need to know what new relationships exist, but >> >> > also what old relationships have failed. To dole out the data in >> >> > drips and drabs is like serving a pint of beer in sippy cups. That >> is >> >> > to say: most users need the entire social graph, so what is the use >> >> > case, from an API user's perspective, of NOT maintaining at least one >> >> > means to quickly, reliably, and efficiently get it in a single call? >> >> >> >> > - API Barriers to entry. Most of the aforementioned arguments are >> >> > obviously from an API user's perspective, but there's something, too, >> >> > for Twitter to consider. Namely, by increasing the complexity and >> >> > learning curve of particular API actions, you presumably further >> limit >> >> > the pool of developers who will engage with that API. That's >> probably >> >> > a bad thing. >> >> >> >> > - Limits Twitter 2.0 app development. This, again, speaks to issues >> >> > bearing on speed and complexity, but I think it is important. The >> >> > first few apps in any given media or innovation invariably have to do >> >> > with basic functionality building blocks -- tweeting, following, >> >> > showing tweets. But the next wave almost always has to do with >> >> > measurement and analysis. By making such analysis more difficult, >> you >> >> > forestall the critically important ability for brands, and others, to >> >> > measure performance. >> >> >> >> > - API users have requested it. Shouldn't, ultimately, the use case >> >> > for a particular API method simply be the fact that a number of API >> >> > developers have requested that it remain? >> >> >> >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote: >> >> > > Can everyone contribute their use case for this API method? I'm >> trying >> >> > > to fully understand the deficiencies of the cursor approach. >> >> >> >> > > Please don't include that cursors are slow or that they are charged >> >> > > against the rate limit, as those are known issues. >> >> >> >> > > Thanks. >> >> >> >> >> > >> > >