I'm looking into releasing a data set based on information pulled from the
Twitter API. It would be a free release limited to academic researchers, an
anonymized version of the network connections of several million users with
public profiles.

What I'm hoping to release is something like this:
<user id>, <city-level location>, <follower ids>, <friend ids>

In all cases, the ids are arbitrary identifiers that are not convertible to
actual Twitter ids, and any detailed locations are converted to the nearest
large city.

I'm aware that it may be possible to de-anonymize some of these users based
on topology, but since much richer information is available through the API
on these users anyway, that seems unlikely to be an issue? However I'm
obviously keen to hear any concerns that Twitter (or other developers here)
may have before I go forward with this.

cheers,
            Pete

Reply via email to