Re: [Wikimedia-l] Responses to en-wp sourcing question

2018-09-05 Thread sashi

Hi Adam, Pine, Robert,

Thank for the suggestions!  In particular, Adam's link to Ford, etal., 
where I read:


-- We used Apache's Map Reduce framework on Amazon's Elastic Map Reduce 
(EMR) cloud computing infrastructure to efficiently extract the history 
of references to all articles.


That sounds like power tools!  I've been using more of a clunky bucket 
chain procedure which only captures part of what has "stuck" in the 
river.  (They downloaded a corpus with all its deletion history.)  We do 
reach some of the same conclusions, but the data are very different  
(twitter & facebook weren't quite as weighty back in 2012, for 
example).  That said, I suspect it would be much wiser to work on a 
database dump as they have.


The classified version  (linked below) is getting more interesting now.  
Left papers often do better than their circulation figures would 
suggest, though Brazil & Germany being the notable exceptions.  In any 
case, what's very clear is that on en-wp, *Pitchfork* does much better 
than the *Poetry Foundation*.


>> http://www.creoliste.fr/docs/WikiInSources_cat.pdf <<

Not to worry, Robert, *Wikipediocracy* barely makes the list...

I'll have a look at the research mailing list once I've finished 
exploring Adam's suggestion, Pine.   Thanks to the three of you for 
taking the time to respond!


sashi





___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Wikipedia's sourcing

2018-08-27 Thread sashi

Hello,

I thought I would ask if any of the junior or senior researchers here on 
this mailing list have conducted previous inquiries into Wikipedia's 
sourcing.


I am currently working on a project of determining what proportion of 
Wikipedia is sourced to newspapers, the military, the Church, social 
media, etc.


The data I've compiled this month, along with a brief write-up, have 
been posted to Wikipediocracy:


http://wikipediocracy.com/2018/08/26/wikipedia-sources-methods/

I imagine I'm reinventing the wheel... such studies have been done 
before, by the WMF, with power tools (bots), right?


Thanks for any corrections / suggestions,

  sashi


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Category: French Jews on en.wp / GDPR

2018-05-31 Thread sashi

Another follow-up:

==
Benjamin Lees wrote: "No, French Christians are just tagged with 
subcategories of Category:French Christians. The "requiring diffusion" 
category that you complain of is in fact a way to tell editors that 
pages in the category should really be in subcategories instead."

==

Aha! You're right, I had not realized that "diffuse" (disseminate/spread 
widely) was being used as specialized en-wiki-jargon for 
"subcategorize". It might be wise to give that hidden category a more 
descriptive name.


I looked into one of the many BLP entries with an unscourced 
Category:French Jews tag, and found a review of a book they wrote. In 
that book, the person stated that while they had a Jewish mother, they 
did not consider themselves Jewish.


Given that the category French Jews contains more members than the 
category French Roman Catholics, and that there are living people 
included in both categories... I seriously wonder what it is that 
motivates folks to anonymously tag others in this way (i.e. whether they 
want to be tagged or not).


The Library of Congress, the BNF,  Wikidata, etc. don't label people 
according to religion, unless their notability is due specifically to 
their religion (e.g. Alfred Dreyfus, Maimonides, etc.).  On en.wp people 
being labeled as Jewish/Catholic, etc. tend to be industrialists, 
politicians, journalists, bankers etc.  I don't think this is "best 
practice" and I'm afraid I do not agree that en.wp is mostly "getting it 
right" with regard to this specific question.  Fr.WP and Wikidata are 
doing much better.


The relevant section on "data subject" privacy rights in the GDPR (in 
English) is based on the 1978 French law I cited earlier (though it has 
become more restrictive since -- see below).  As David Gerard noted, it 
is quite likely that this affects not only Wikipedians (who can petition 
to have libel/slander concerning their *online identity* (cf. definition 
of data subject) removed from (inter alia) block logs), but also the 
*content* of biographies of living people in the encyclopedia.


== GDPR (Article 9)==

*Processing* of personal data revealing racial or ethnic origin, 
political opinions, religious or philosophical beliefs, or trade union 
membership, and the processing of genetic data, biometric data for the 
purpose of uniquely identifying a natural person, data concerning health 
or data concerning a natural person's sex life or sexual orientation 
shall be prohibited.


==

As one who has contributed to the projects since 2006, I am posting this 
here not because I wish to sow dissent, but because I think some quick 
thinking and corrective action is needed.


  sashi




___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Category: French Jews on en.wp / GPDR

2018-05-27 Thread sashi
tributors 
(whose age they don't verify) falling afoul of their national laws.  
Simply excluding members of Category:BLP & Category:French 
Jews/Catholics/Muslims/Freemasons/etc. from the hidden Category 
"requiring diffusion" and adding them to the hidden Category "noindex" 
would go a long way towards protecting privacy rights (at least as far 
as google is concerned).


Finally -- again -- how useful are these automatically generated lists 
towards advancing the "freedom of knowledge" (as Nathan put it)?   To 
repeat: these categories make it seem that there are/have been 40 times 
more notable Jewish people and five times more notable Muslims in France 
than notable Christians .  This (derived) "knowledge" is patently 
false.  Now, granted, the purpose of the automatically generated 
categories is not to come up with a comparative tally of noteworthy 
people; but I think what this tally shows is in itself revealing:   
Wikipedians are 40 times more likely to tag notable Jewish people as 
Jews and 5 times more likely to tag notable Muslims as Muslim than they 
are to tag notable Christians as Christians.  This is worth thinking 
about for a minute...


Why would it be so hard to be humble and respect national laws by making 
it such that membership in the category would not be diffused concerning 
living people in countries where such lists are illegal? (As Yaroslav 
points out, there is no guarantee of the quality of the sourcing).  
En.wp might be wise to learn from the conservative approach to this 
question taken by fr.wp and wikidata.


I hope this helps to clarify the original post.

   sashi

ps:  *Correction*:  Contrary to what I mistakenly wrote in my OP there 
are 96 members of the category French Muslims (not 0).



___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Category: French Jews on en.wp / GPDR

2018-05-25 Thread sashi

Hello,

I am writing to ask if there are any plans to render the English 
Wikipedia compliant with French privacy laws.  Currently, if a French 
high school student goes to a French library, reserves a computer, and 
types "List of French Jews" into Google, Duckduckgo, or Dogpile, an 
adhoc en.wikipedia list of over 850 people (approximately half of them 
living) appears in the #2 position (Category: French Jews). In the first 
position is the English Wikipedia page "List of French Jews" containing 
the following text, originally added in 2010, showing that the 
en.wikipedia community is aware that they are breaking French law:


"The French nationality law itself, strongly secular, forbids any 
statistics or lists based on ethnic or religious membership."


A French person tagging biographies of living people in en.wp with the 
category "French Jews" is a violation of French privacy law which would 
expose the Wikipedian to a penalty of €300,000 and/or 5 years imprisonment:


"Le fait, hors les cas prévus par la loi, de mettre ou de conserver en 
mémoire informatisée, sans le consentement exprès de l’intéressé, des 
données à caractère personnel qui, directement ou indirectement, font 
apparaître les origines raciales ou ethniques, les opinions politiques, 
philosophiques ou religieuses, ou les appartenances syndicales des 
personnes, ou qui sont relatives à la santé ou à l’orientation ou à 
l'identité sexuelle de celles-ci, est puni de cinq ans d’emprisonnement 
et de 300 000 € d’amende." (source:  
https://www.cnil.fr/fr/les-sanctions-penales )


There is, to the best of my knowledge, no such category on fr.wp, as 
people in France are well aware of the law.


See also "List of West European Jews" / Category: French People of 
Jewish descent / Category: French People of Arab descent / Category: 
French Freemasons (167), Category: French Atheists (93 including a 
recent president), etc.


I noticed in researching the question that the Category "French rapists" 
(2 BLP) is associated with the hidden category "No indexed", whereas the 
category "French Jews" (100s of BLP) is associated with the hidden 
category: "categories requiring diffusion".  As a temporary measure (to 
avoid actively feeding this info into search engines), perhaps 
categories related to racial/ethnic origins, religious & philosophical 
opinions could be tagged "No indexed" rather than "requiring diffusion"?


The WMF hosts their servers in the US, the Netherlands and will soon 
also be hosting off-shore in Singapore, which probably leads WMF legal 
to believe that this grants them immunity from French privacy laws.  
Nevertheless, I thought I would mention that this is a potentially 
significant problem going forward.  Discussion leading to action 
correcting this potential avenue of abuse might help the WMF to avoid 
litigation, given that the current policies on English Wikipedia 
actively facilitate violation of French laws.


(data from petscan.wmflabs.org): French Christians (21 members), French 
Hindus (17 members), French Buddhists (9 members), French Muslims (0 
members), French Jews (862 members).


Thank you for your time considering how best to address this problem.

sashi



___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>