That's a tough question, and I'm not sure what the answer is. There is a little bit of precedent with https://www.mediawiki.org/w/index.php?oldid=2533048&title=Extension:AntiBot
When evaluating harm, I guess one of the questions is how does your approach compare in effectiveness to other publicly available approaches like http://www.philocomp.net/humanities/signature.htm & https://github.com/search?q=authorship+attribution+user:pan-webis-de ? (i.e. There is more harm if your approach is significantly better than other already available tools, and less if they're at a similar level) -- Brian On Thu, Aug 6, 2020 at 2:33 AM Amir Sarabadani <ladsgr...@gmail.com> wrote: > Hey, > I have an ethical question that I couldn't answer yet and have been asking > around but no definite answer yet so I'm asking it in a larger audience in > hope of a solution. > > For almost a year now, I have been developing an NLP-based AI system to be > able to catch sock puppets (two users pretending to be different but > actually the same person). It's based on the way they speak. The way we > speak is like a fingerprint and it's unique to us and it's really hard to > forge or change on demand (unlike IP/UA), as the result if you apply some > basic techniques in AI on Wikipedia discussions (which can be really > lengthy, trust me), the datasets and sock puppets shine. > > Here's an example, I highly recommend looking at these graphs, I compared > two pairs of users, one pair that are not sock puppets and the other is a > pair of known socks (a user who got banned indefinitely but came back > hidden under another username). [1][2] These graphs are based one of > several aspects of this AI system. > > I have talked about this with WMF and other CUs to build and help us > understand and catch socks. Especially the ones that have enough resources > to change their IP/UA regularly (like sock farms, and/or UPEs) and also > with the increase of mobile intern providers and the horrible way they > assign IP to their users, this can get really handy in some SPI ("Sock > puppet investigation") [3] cases. > > The problem is that this tool, while being built only on public > information, actually has the power to expose legitimate sock puppets. > People who live under oppressive governments and edit on sensitive topics. > Disclosing such connections between two accounts can cost people their > lives. > > So, this code is not going to be public, period. But we need to have this > code in Wikimedia Cloud Services so people like CUs in other wikis be able > to use it as a web-based tool instead of me running it for them upon > request. But WMCS terms of use explicitly say code should never be > closed-source and this is our principle. What should we do? I pay a > corporate cloud provider for this and put such important code and data > there? We amend the terms of use to have some exceptions like this one? > > The most plausible solution suggested so far (thanks Huji) is to have a > shell of a code that would be useless without data, and keep the code that > produces the data (out of dumps) closed (which is fine, running that code > is not too hard even on enwiki) and update the data myself. This might be > doable (which I'm around 30% sure, it still might expose too much) but it > wouldn't cover future cases similar to mine and I think a more long-term > solution is needed here. Also, it would reduce the bus factor to 1, and > maintenance would be complicated. > > What should we do? > > Thanks > [1] > > https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_1.png > [2] > > https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_2.png > [3] https://en.wikipedia.org/wiki/Wikipedia:SPI > -- > Amir (he/him) > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l