Re: [Wikitech-l] Ethical question regarding some code

2020-08-05 Thread bawolff
That's a tough question, and I'm not sure what the answer is.

There is a little bit of precedent with
https://www.mediawiki.org/w/index.php?oldid=2533048=Extension:AntiBot

When evaluating harm, I guess one of the questions is how does your
approach compare in effectiveness to other publicly available approaches
like http://www.philocomp.net/humanities/signature.htm &
https://github.com/search?q=authorship+attribution+user:pan-webis-de ?
(i.e. There is more harm if your approach is significantly better than
other already available tools, and less if they're at a similar level)

--
Brian

On Thu, Aug 6, 2020 at 2:33 AM Amir Sarabadani  wrote:

> Hey,
> I have an ethical question that I couldn't answer yet and have been asking
> around but no definite answer yet so I'm asking it in a larger audience in
> hope of a solution.
>
> For almost a year now, I have been developing an NLP-based AI system to be
> able to catch sock puppets (two users pretending to be different but
> actually the same person). It's based on the way they speak. The way we
> speak is like a fingerprint and it's unique to us and it's really hard to
> forge or change on demand (unlike IP/UA), as the result if you apply some
> basic techniques in AI on Wikipedia discussions (which can be really
> lengthy, trust me), the datasets and sock puppets shine.
>
> Here's an example, I highly recommend looking at these graphs, I compared
> two pairs of users, one pair that are not sock puppets and the other is a
> pair of known socks (a user who got banned indefinitely but came back
> hidden under another username). [1][2] These graphs are based one of
> several aspects of this AI system.
>
> I have talked about this with WMF and other CUs to build and help us
> understand and catch socks. Especially the ones that have enough resources
> to change their IP/UA regularly (like sock farms, and/or UPEs) and also
> with the increase of mobile intern providers and the horrible way they
> assign IP to their users, this can get really handy in some SPI ("Sock
> puppet investigation") [3] cases.
>
> The problem is that this tool, while being built only on public
> information, actually has the power to expose legitimate sock puppets.
> People who live under oppressive governments and edit on sensitive topics.
> Disclosing such connections between two accounts can cost people their
> lives.
>
> So, this code is not going to be public, period. But we need to have this
> code in Wikimedia Cloud Services so people like CUs in other wikis be able
> to use it as a web-based tool instead of me running it for them upon
> request. But WMCS terms of use explicitly say code should never be
> closed-source and this is our principle. What should we do? I pay a
> corporate cloud provider for this and put such important code and data
> there? We amend the terms of use to have some exceptions like this one?
>
> The most plausible solution suggested so far (thanks Huji) is to have a
> shell of a code that would be useless without data, and keep the code that
> produces the data (out of dumps) closed (which is fine, running that code
> is not too hard even on enwiki) and update the data myself. This might be
> doable (which I'm around 30% sure, it still might expose too much) but it
> wouldn't cover future cases similar to mine and I think a more long-term
> solution is needed here. Also, it would reduce the bus factor to 1, and
> maintenance would be complicated.
>
> What should we do?
>
> Thanks
> [1]
>
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_1.png
> [2]
>
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_2.png
> [3] https://en.wikipedia.org/wiki/Wikipedia:SPI
> --
> Amir (he/him)
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Ethical question regarding some code

2020-08-05 Thread AntiCompositeNumber
Creating and promoting the use of a closed-source tool, especially one
used to detect disruptive editing, runs counter to core Wikimedia
community principles.

Making such a tool closed-source prevents the Wikimedia editing
community from auditing its use, contesting its decisions, making
improvements to it, or learning from its creation. This causes harm to
the community.

Open-sourcing a tool such as this could allow an unscrupulous user to
connect accounts that are not publicly connected. This is a problem
with all sock detection tools. It also causes harm to the community.

The only way to create such a tool that does not harm the community in
any way is to make the tool's decision making entirely public while
keeping the tool's decisions non-public. This is not possible.
However, we can approach that goal using careful engineering and
attempt to minimize harm. Things like restricting the interface to
CUs, requiring a logged reason for a check, technical barriers against
fishing (comparing two known users, not looking for other potential
users), not making processed data available publicly, and publishing
the entire source code (including code used to load data) can reduce
harm.

After all that, if you are not satisfied that harm has been
sufficiently reduced, there is only one answer: do not create the
tool.

AntiCompositeNumber

On Wed, Aug 5, 2020 at 10:33 PM Amir Sarabadani  wrote:
>
> Hey,
> I have an ethical question that I couldn't answer yet and have been asking
> around but no definite answer yet so I'm asking it in a larger audience in
> hope of a solution.
>
> For almost a year now, I have been developing an NLP-based AI system to be
> able to catch sock puppets (two users pretending to be different but
> actually the same person). It's based on the way they speak. The way we
> speak is like a fingerprint and it's unique to us and it's really hard to
> forge or change on demand (unlike IP/UA), as the result if you apply some
> basic techniques in AI on Wikipedia discussions (which can be really
> lengthy, trust me), the datasets and sock puppets shine.
>
> Here's an example, I highly recommend looking at these graphs, I compared
> two pairs of users, one pair that are not sock puppets and the other is a
> pair of known socks (a user who got banned indefinitely but came back
> hidden under another username). [1][2] These graphs are based one of
> several aspects of this AI system.
>
> I have talked about this with WMF and other CUs to build and help us
> understand and catch socks. Especially the ones that have enough resources
> to change their IP/UA regularly (like sock farms, and/or UPEs) and also
> with the increase of mobile intern providers and the horrible way they
> assign IP to their users, this can get really handy in some SPI ("Sock
> puppet investigation") [3] cases.
>
> The problem is that this tool, while being built only on public
> information, actually has the power to expose legitimate sock puppets.
> People who live under oppressive governments and edit on sensitive topics.
> Disclosing such connections between two accounts can cost people their
> lives.
>
> So, this code is not going to be public, period. But we need to have this
> code in Wikimedia Cloud Services so people like CUs in other wikis be able
> to use it as a web-based tool instead of me running it for them upon
> request. But WMCS terms of use explicitly say code should never be
> closed-source and this is our principle. What should we do? I pay a
> corporate cloud provider for this and put such important code and data
> there? We amend the terms of use to have some exceptions like this one?
>
> The most plausible solution suggested so far (thanks Huji) is to have a
> shell of a code that would be useless without data, and keep the code that
> produces the data (out of dumps) closed (which is fine, running that code
> is not too hard even on enwiki) and update the data myself. This might be
> doable (which I'm around 30% sure, it still might expose too much) but it
> wouldn't cover future cases similar to mine and I think a more long-term
> solution is needed here. Also, it would reduce the bus factor to 1, and
> maintenance would be complicated.
>
> What should we do?
>
> Thanks
> [1]
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_1.png
> [2]
> https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_2.png
> [3] https://en.wikipedia.org/wiki/Wikipedia:SPI
> --
> Amir (he/him)
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Ethical question regarding some code

2020-08-05 Thread Amir Sarabadani
Hey,
I have an ethical question that I couldn't answer yet and have been asking
around but no definite answer yet so I'm asking it in a larger audience in
hope of a solution.

For almost a year now, I have been developing an NLP-based AI system to be
able to catch sock puppets (two users pretending to be different but
actually the same person). It's based on the way they speak. The way we
speak is like a fingerprint and it's unique to us and it's really hard to
forge or change on demand (unlike IP/UA), as the result if you apply some
basic techniques in AI on Wikipedia discussions (which can be really
lengthy, trust me), the datasets and sock puppets shine.

Here's an example, I highly recommend looking at these graphs, I compared
two pairs of users, one pair that are not sock puppets and the other is a
pair of known socks (a user who got banned indefinitely but came back
hidden under another username). [1][2] These graphs are based one of
several aspects of this AI system.

I have talked about this with WMF and other CUs to build and help us
understand and catch socks. Especially the ones that have enough resources
to change their IP/UA regularly (like sock farms, and/or UPEs) and also
with the increase of mobile intern providers and the horrible way they
assign IP to their users, this can get really handy in some SPI ("Sock
puppet investigation") [3] cases.

The problem is that this tool, while being built only on public
information, actually has the power to expose legitimate sock puppets.
People who live under oppressive governments and edit on sensitive topics.
Disclosing such connections between two accounts can cost people their
lives.

So, this code is not going to be public, period. But we need to have this
code in Wikimedia Cloud Services so people like CUs in other wikis be able
to use it as a web-based tool instead of me running it for them upon
request. But WMCS terms of use explicitly say code should never be
closed-source and this is our principle. What should we do? I pay a
corporate cloud provider for this and put such important code and data
there? We amend the terms of use to have some exceptions like this one?

The most plausible solution suggested so far (thanks Huji) is to have a
shell of a code that would be useless without data, and keep the code that
produces the data (out of dumps) closed (which is fine, running that code
is not too hard even on enwiki) and update the data myself. This might be
doable (which I'm around 30% sure, it still might expose too much) but it
wouldn't cover future cases similar to mine and I think a more long-term
solution is needed here. Also, it would reduce the bus factor to 1, and
maintenance would be complicated.

What should we do?

Thanks
[1]
https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_1.png
[2]
https://commons.wikimedia.org/wiki/File:Word_distributions_of_two_users_in_fawiki_2.png
[3] https://en.wikipedia.org/wiki/Wikipedia:SPI
-- 
Amir (he/him)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] [Train] 1.36.0-wmf.3 status update (currently at group0, blocked)

2020-08-05 Thread Brennen Bearnes

The 1.36.0-wmf.3 version of MediaWiki  is currently blocked at group0.[0]

The new version can proceed no further until these issues are resolved 
or appropriately triaged:


* Uncaught ArgumentCountError: Too few arguments to function 
OOUI\Tag::appendContent(), 0 passed

- https://phabricator.wikimedia.org/T259745

* Argument 3 passed to 
CachingFallbackLabelDescriptionLookup::buildCacheKey() must be of the 
type string, null given

- https://phabricator.wikimedia.org/T259744

Thanks for any help resolving these issues.  If they're handled by 
roughly 14:30 PDT / 21:30 UTC today, the train can roll forward. 
Otherwise it can resume during US working hours on Thursday.


-- Your typical train temporizer

[0]. 
[1]. 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] First desktop improvements features now available on early adopter wikis

2020-08-05 Thread Olga Vasileva
Hi all,

We’re happy to announce that the first two of many changes focused on
improving the desktop experience of the Vector skin [1] have been released
as a user preference to all projects and as default on a set of early
adopter wikis: Basque, Farsi, French, and Hebrew Wikipedias, French
Wiktionary, and Portuguese Wikiversity.

Since its introduction in 2009, the Vector skin has changed little, while
the needs of our readers and editors have shifted significantly, as have
their expectations for a quality reading experience that focuses on the
content itself. Over the next year, the readers web team [2] will be
researching and building out improvements to the desktop experience based
on research and existing tools built by our communities.

Our goal is to create a more welcoming reading and editing experience -
something that feels familiar yet makes it easier and quicker to read,
edit, and perform common functionality.

Our first change, a collapsible sidebar, allows users to collapse the
lengthy menu on the left side of the page. We believe this change improves
usability by allowing people to focus on the content itself - on reading,
editing, or moderating.

Our second change introduces a maximum line width to our content on pages
such as article pages and discussion pages. Studies have shown that
limiting the width can lead to better retention of content, as well as a
decrease in eye strain

You can opt into these features by unchecking “legacy vector” from the
appearance tab of your user preferences.

We’d also like to note that these are the first of a series of changes and,
as such, their visual characteristics are not permanent. Also - there might
be bugs. If you notice an issue or would like to learn more about the
project itself - please head to our project page [3].

Thank you!

Olga

[1] https://www.mediawiki.org/wiki/Skin:Vector
[2] https://www.mediawiki.org/wiki/Readers/Web/Team
[3] https://www.mediawiki.org/wiki/Reading/Web/Desktop_Improvements
-- 
*Olga Vasileva* // Senior Product Manager // Web
https://wikimediafoundation.org/


*Imagine a world in which every single human being can freely share in
the sum of all knowledge. That's our commitment. Donate
. *
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l