Hi Brian, 2015-03-30 0:25 GMT+02:00 Brian <reflect...@gmail.com>: > Although the initial goal of the Netflix Prize was to design a > collaborative filtering algorithm, it became notorious when the data was > used to de-anonymize Netflix users. Researchers proved that given just a > user's movie ratings on one site, you can plug those ratings into another > site, such as the IMDB. You can then take that information, and with some > Google searches and optionally a bit of cash (for websites that sell user > information, including, in some cases, their SSN) figure out who they are. > You could even drive up to their house and take a selfie with them, or > follow them to work and meet their boss and tell them about their views on > the topics they were editing.
somewhat tangentially, and to bring back this to topic to a more scientific setting I would like to point out that there has already been reasearch in the past on this topic. I highly recommend reading the following paper: Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009. (PDF <http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/lieberman-lin.YouAreWhereYouEdit.ICWSM09.pdf>) For those of you that don't want to read the whole paper, you can find a recap of the most relevant findings in this presentation by Maurizio Napolitano: <http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew> The main idea is associating spatial coordinates to a Wikipedia articles when possible, this articles are called "geopages". Then you extract from the history of articles the users which have edited a geopage. If you plot the geopages edited by a given contributor you can see that they tend to cluster, so you can define an "edit area". The study finds that 30-35% of contributors concentrate their edits in an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)). For another free/libre project with a geographic focus like OpenStreetMap this is even more marked, check out for example this tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by Pascal Neis. This, of course, is not a straightforward de-anonimization but this methods work in principle for every contributor even if you obfuscate their IP or username (provided that you can still assign all the edits from a given user to a unique and univocal identifier) C [1] https://en.wikipedia.org/wiki/Square_degree [2a] http://yosmhm.neis-one.org/ [2b] http://neis-one.org/2011/08/yosmhm/ _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>