~300,000 ids in 15 seconds: $ time curl http://twitter.com/followers/ids.xml?screen_name=dougw > / dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5545k 100 5545k 0 0 346k 0 0:00:15 0:00:15 --:--:-- 474k
real 0m15.994s user 0m0.021s sys 0m0.061s === ~100,000 ids in 6 seconds: $ time curl http://twitter.com/followers/ids.xml?screen_name=karlrove > /dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1700k 100 1700k 0 0 286k 0 0:00:05 0:00:05 --:--:-- 433k real 0m5.932s user 0m0.010s sys 0m0.025s === 12,000 ids in 1.2 seconds: $ time curl http://twitter.com/followers/ids.xml?screen_name=markos > / dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 213k 100 213k 0 0 168k 0 0:00:01 0:00:01 --:--:-- 257k real 0m1.269s user 0m0.004s sys 0m0.003s === These calls are night-and-day better than cursor-based calls. Again, I plead with those folks pushing the bits about to preserve what is officially documented. On Jan 4, 10:23 pm, John Kalucki <j...@twitter.com> wrote: > The "existing" APIs stopped providing accurate data about a year ago > and degraded substantially over a period of just a few months. Now the > only data store for social graph data requires cursors to access > complete sets. Pagination is just not possible with the same latency > at this scale without an order of magnitude or two increase in cost. > So, instead of hardware "units" in the tens and hundreds, think about > the same in the thousands and tens of thousands. > > These APIs and their now decommissioned backing stores were developed > when having 20,000 followers was a lot. We're an order of magnitude or > two beyond that point along nearly every dimension. Accounts. > Followers per account. Tweets per second. Etc. As systems evolve, some > evolutionary paths become extinct. > > Given boundless resources, the best we could do for a REST API, as > Marcel has alluded, is to do the cursoring for you and aggregate many > blocks into much larger responses. This wouldn't work very well for at > least two immediate reasons: 1) Running a system with multimodal > service times is a nightmare -- we'd have to provision a specific > endpoint for such a resource. 2) Ruby GC chokes on lots of objects. > We'd have to consider implementing this resource in another stack, or > do a lot of tuning. All this to build the opposite of what most > applications want: a real-time stream of graph deltas for a set of > accounts, or the list of recent set operations since the last poll -- > and rarely, if ever, the entire following set. > > Also, I'm a little rusty on the details on the social graph api, but > please detail which public resources allow retrieval of 40,000 > followers in two seconds. I'd be very interested in looking at the > implementing code on our end. A curl timing would be nice (time curl > URL > /dev/null) too. > > -John Kaluckihttp://twitter.com/jkalucki > Services, Twitter Inc. > > On Mon, Jan 4, 2010 at 9:18 PM, PJB <pjbmancun...@gmail.com> wrote: > > > On Jan 4, 8:58 pm, John Kalucki <j...@twitter.com> wrote: > >> at the moment). So, it seems that we're returning the data over home > >> DSL at between 2,500 and 4,000 ids per second, which seems like a > >> perfectly reasonable rate and variance. > > > It's certainly not reasonable to expect it to take 10+ seconds to get > > 25,000 to 40,000 ids, PARTICULARLY when existing methods, for whatever > > reason, return the same data in less than 2 seconds. Twitter is being > > incredibly short-sighted if they think this is indeed reasonable. > > > Some of us have built applications around your EXISTING APIs, and to > > now suggest that we may need formal "business relationships" to > > continue to use such APIs is seriously disquieting. > > > Disgusted... > >