That's a tough question, and I'm not sure what the answer is.

There is a little bit of precedent with
https://www.mediawiki.org/w/index.php?oldid=2533048&title=Extension:AntiBot

When evaluating harm, I guess one of the questions is how does your
approach compare in effectiveness to other publicly available approaches
like http://www.philocomp.net/humanities/signature.htm &
https://github.com/search?q=authorship+attribution+user:pan-webis-de ?
(i.e. There is more harm if your approach is significantly better than
other already available tools, and less if they're at a similar level)

--
Brian

On Thu, Aug 6, 2020 at 2:33 AM Amir Sarabadani <ladsgr...@gmail.com> wrote:

> Hey,
> I have an ethical question that I couldn't answer yet and have been asking
> around but no definite answer yet so I'm asking it in a larger audience in
> hope of a solution.
>
> For almost a year now, I have been developing an NLP-based AI system to be
> able to catch sock puppets (two users pretending to be different but
> actually the same person). It's based on the way they speak. The way we
> speak is like a fingerprint and it's unique to us and it's really hard to
> forge or change on demand (unlike IP/UA), as the result if you apply some
> basic techniques in AI on Wikipedia discussions (which can be really
> lengthy, trust me), the datasets and sock puppets shine.
>
> Here's an example, I highly recommend looking at these graphs, I compared
> two pairs of users, one pair that are not sock puppets and the other is a
> pair of known socks (a user who got banned indefinitely but came back
> hidden under another username). [1][2] These graphs are based one of
> several aspects of this AI system.
>
> I have talked about this with WMF and other CUs to build and help us
> understand and catch socks. Especially the ones that have enough resources
> to change their IP/UA regularly (like sock farms, and/or UPEs) and also
> with the increase of mobile intern providers and the horrible way they
> assign IP to their users, this can get really handy in some SPI ("Sock
> puppet investigation") [3] cases.
>
> The problem is that this tool, while being built only on public
> information, actually has the power to expose legitimate sock puppets.
> People who live under oppressive governments and edit on sensitive topics.
> Disclosing such connections between two accounts can cost people their
> lives.
>
> So, this code is not going to be public, period. But we need to have this
> code in Wikimedia Cloud Services so people like CUs in other wikis be able
> to use it as a web-based tool instead of me running it for them upon
> request. But WMCS terms of use explicitly say code should never be
> closed-source and this is our principle. What should we do? I pay a
> corporate cloud provider for this and put such important code and data
> there? We amend the terms of use to have some exceptions like this one?
>
> The most plausible solution suggested so far (thanks Huji) is to have a
> shell of a code that would be useless without data, and keep the code that
> produces the data (out of dumps) closed (which is fine, running that code
> is not too hard even on enwiki) and update the data myself. This might be
> doable (which I'm around 30% sure, it still might expose too much) but it
> wouldn't cover future cases similar to mine and I think a more long-term
> solution is needed here. Also, it would reduce the bus factor to 1, and
> maintenance would be complicated.
>
> What should we do?
>
> Thanks
> [1]
>
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_1.png
> [2]
>
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_2.png
> [3] https://en.wikipedia.org/wiki/Wikipedia:SPI
> --
> Amir (he/him)
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to