The value lies in the particular properties of a real social graph, as opposed to an artificially generated one. The sort of questions it's useful for are primarily social rather than mathematical. For a summary of some existing research on similar data sets, see:
http://petewarden.typepad.com/searchbrowser/2010/02/social-network-data-and-research.html On Wed, Feb 24, 2010 at 11:18 AM, M. Edward (Ed) Borasky <zn...@cesmail.net>wrote: > Quoting Pete Warden <p...@petewarden.com>: > > I'm looking into releasing a data set based on information pulled from the >> Twitter API. It would be a free release limited to academic researchers, >> an >> anonymized version of the network connections of several million users >> with >> public profiles. >> >> What I'm hoping to release is something like this: >> <user id>, <city-level location>, <follower ids>, <friend ids> >> >> In all cases, the ids are arbitrary identifiers that are not convertible >> to >> actual Twitter ids, and any detailed locations are converted to the >> nearest >> large city. >> >> I'm aware that it may be possible to de-anonymize some of these users >> based >> on topology, but since much richer information is available through the >> API >> on these users anyway, that seems unlikely to be an issue? However I'm >> obviously keen to hear any concerns that Twitter (or other developers >> here) >> may have before I go forward with this. >> > > What is the value of such a dataset to an "academic researcher"? I consider > myself an academic researcher, though I don't have a formal position as one. > What can you do with a "real" Twitter "social graph" that you can't do with > one generated by random techniques based on statistical sampling of Twitter > data? > > A million-user "real" social graph, even assuming fewer than 5,000 > friend_ids and follower_ids per user, costs two million API calls. At 350 > calls per hour, that works out to 238 days by my calculation. And during > that 238 days, the social graph is changing many times a second. A > randomly-generated graph of a much larger size could be constructed in a > day, *including* coding time, *and* you could incorporate the changing > nature of Twitter social graphs in a simulation. > > (Smiling at the subtle irony in my standard email signature) ;-) > > -- > M. Edward (Ed) Borasky > borasky-research.net/m-edward-ed-borasky/ > > "A mathematician is a device for turning coffee into theorems." ~ Paul > Erdos >