The backend datastore returns following blocks in constant time, regardless of the cursor depth. When I test a user with 100k+ followers via twitter.com using a ruby script, I see each cursored block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds, median 1.47 sec, stddev of .377, (home DSL, shared by several people at the moment). So, it seems that we're returning the data over home DSL at between 2,500 and 4,000 ids per second, which seems like a perfectly reasonable rate and variance.
If I recall correctly, the "cursorless" methods are just shunted to the first block each time, and thus represent a constant, incomplete, amount of data... Looking into my crystal ball, if you want a lot more than several thousand widgets per second from Twitter, you probably aren't going to get them via REST, and you will probably have some sort of "business relationship" in place with Twitter. -John Kalucki http://twitter.com/jkalucki Services, Twitter Inc. (A slice of data below) url /followers/ids/alexa_chung.xml?cursor=-1 fetch time = 1.478542 url /followers/ids/alexa_chung.xml?cursor=1322524362256299608 fetch time = 2.044831 url /followers/ids/alexa_chung.xml?cursor=1321126009663170021 fetch time = 1.350035 url /followers/ids/alexa_chung.xml?cursor=1319359640017038524 fetch time = 1.44636 url /followers/ids/alexa_chung.xml?cursor=1317653620096535558 fetch time = 1.955163 url /followers/ids/alexa_chung.xml?cursor=1316184964685221966 fetch time = 1.326226 url /followers/ids/alexa_chung.xml?cursor=1314866514116423204 fetch time = 1.96824 url /followers/ids/alexa_chung.xml?cursor=1313551933690106944 fetch time = 1.513922 url /followers/ids/alexa_chung.xml?cursor=1312201296962214944 fetch time = 1.59179 url /followers/ids/alexa_chung.xml?cursor=1311363260604388613 fetch time = 2.259924 url /followers/ids/alexa_chung.xml?cursor=1310627455188010229 fetch time = 1.706438 url /followers/ids/alexa_chung.xml?cursor=1309772694575801646 fetch time = 1.460413 On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote: > > Some quick benchmarks... > > Grabbed entire social graph for ~250 users, where each user has a > number of friends/followers between 0 and 80,000. I randomly used > both the cursor and cursor-less API methods. > > < 5000 ids > cursor: 0.72 avg seconds > cursorless: 0.51 avg seconds > > 5000 to 10,000 ids > cursor: 1.42 avg seconds > cursorless: 0.94 avg seconds > > 1 to 80,000 ids > cursor: 2.82 avg seconds > cursorless: 1.21 avg seconds > > 5,000 to 80,000 ids > cursor: 4.28 > cursorless: 1.59 > > 10,000 to 80,000 ids > cursor: 5.23 > cursorless: 1.82 > > 20,000 to 80,000 ids > cursor: 6.82 > cursorless: 2 > > 40,000 to 80,000 ids > cursor: 9.5 > cursorless: 3 > > 60,000 to 80,000 ids > cursor: 12.25 > cursorless: 3.12 > > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote: >> Ditto PJB :-) >> >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote: >> >> > I think that's like asking someone: why do you eat food? But don't say >> > because it tastes good or nourishes you, because we already know >> > that! ;) >> >> > You guys presumably set the 5000 ids per cursor limit by analyzing >> > your user base and noting that one could still obtain the social graph >> > for the vast majority of users with a single call. >> >> > But this is a bit misleading. For analytics-based apps, who aim to do >> > near real-time analysis of relationships, the focus is typically on >> > consumer brands who have a far larger than average number of >> > relationships (e.g., 50k - 200k). >> >> > This means that those apps are neck-deep in cursor-based stuff, and >> > quickly realize the existing drawbacks, including, in order of >> > significance: >> >> > - Latency. Fetching ids for a user with 3000 friends is comparable >> > between the two calls. But as you increment past 5000, the speed >> > quickly peaks at a 5+x difference (I will include more benchmarks in a >> > short while). For example, fetching 80,000 friends via the get-all >> > method takes on average 3 seconds; it takes, on average, 15 seconds >> > with cursors. >> >> > - Code complexity & elegance. I would say that there is a 3x increase >> > in code lines to account for cursors, from retrying failed cursors, to >> > caching to account for cursor slowness, to UI changes to coddle >> > impatient users. >> >> > - Incomprehensibility. While there are obviously very good reasons >> > from Twitter's perspective (performance) to the cursor based model, >> > there really is no apparent obvious benefit to API users for the ids >> > calls. I would make the case that a large majority of API uses of the >> > ids calls need and require the entire social graph, not an incomplete >> > one. After all, we need to know what new relationships exist, but >> > also what old relationships have failed. To dole out the data in >> > drips and drabs is like serving a pint of beer in sippy cups. That is >> > to say: most users need the entire social graph, so what is the use >> > case, from an API user's perspective, of NOT maintaining at least one >> > means to quickly, reliably, and efficiently get it in a single call? >> >> > - API Barriers to entry. Most of the aforementioned arguments are >> > obviously from an API user's perspective, but there's something, too, >> > for Twitter to consider. Namely, by increasing the complexity and >> > learning curve of particular API actions, you presumably further limit >> > the pool of developers who will engage with that API. That's probably >> > a bad thing. >> >> > - Limits Twitter 2.0 app development. This, again, speaks to issues >> > bearing on speed and complexity, but I think it is important. The >> > first few apps in any given media or innovation invariably have to do >> > with basic functionality building blocks -- tweeting, following, >> > showing tweets. But the next wave almost always has to do with >> > measurement and analysis. By making such analysis more difficult, you >> > forestall the critically important ability for brands, and others, to >> > measure performance. >> >> > - API users have requested it. Shouldn't, ultimately, the use case >> > for a particular API method simply be the fact that a number of API >> > developers have requested that it remain? >> >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote: >> > > Can everyone contribute their use case for this API method? I'm trying >> > > to fully understand the deficiencies of the cursor approach. >> >> > > Please don't include that cursors are slow or that they are charged >> > > against the rate limit, as those are known issues. >> >> > > Thanks. >> >> >