Re: [Wiki-research-l] Generalizability of research across different language versions

2019-10-03 Thread Ziko van Dijk
Hello, indeed a very interesting topic, and one should really treat small
and big Wikipedias as very different kinds of websites. Just alone that on
big Wikipedias, you have and use a watchlist, while on a small Wikipedia,
you basically use the Recent changes.
A systematic comparison would be great. My paper ten years ago was more a
survey on the topic by itself: Ziko van Dijk: Wikipedia and
lesser-resourced languages. In: *Language Problems and Language Planning*
33 (2009, Nr. 3, Herbst), S. 234-255.
Actually in the book I am working on right now, such a systematic
comparison would be a very useful example for how to apply my wiki model...
:-)
Kind regards
Ziko



Am Do., 3. Okt. 2019 um 21:13 Uhr schrieb Lucie Kaffee <
lucie.kaf...@gmail.com>:

> Just adding a small point I saw while interviewing editors of different
> language Wikipedias: I believe (and haven't further investigated, so take
> this with a grain of salt) that there is also a general difference in the
> behavior of "small" and "large" communities, e.g., in trust between the
> editors and how they work together. This seemed to be independent of other
> cultural context, but this is rather anecdotal and would be interesting to
> see further investigated.
> I find it generally a very interesting topic and look forward to what
> results from the discussion here, so far I see research only applying their
> methods across Wikipedias rather than drawing conclusion from one language
> version to another.
> Thanks Isaac also for the collection of reading material :)
>
> On Thu, Oct 3, 2019, 16:23 Amir E. Aharoni 
> wrote:
>
> > Thanks a lot for bringing this up.
> >
> > Sorry for not offering a solution, but I do want to mention a
> > frequently-missed aspect of the problem: Wikis in different languages
> have
> > some differences that are understandable because they reflect some
> > objective cultural characteristics of the people who speak it. But some
> > differences are artificial and exit because in the early days of
> Wikimedia
> > (mid-2000s) there were no convenient ways for wikis to communicate and
> > share info. There were no global accounts and no convenient translation
> > tools.
> >
> > Templates are still not global, even though there is huge demand for
> it,[1]
> > and a lot of community process are implemented using templates: requests
> > for deletion, requests for unblocking, article sorting for WikiProjects,
> > stub sorting. Many of these things could be unified, at least partially,
> by
> > making templates global, and among many benefits, it would make research
> > easier, too.
> >
> > [1] It came at #3 in the Community Wishlist vote in 2015, and at #1 in
> > 2016. Despite this demand, it was not implemented :(
> >
> > --
> > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > http://aharoni.wordpress.com
> > ‪“We're living in pieces,
> > I want to live in peace.” – T. Moore‬
> >
> >
> > ‫בתאריך יום ד׳, 2 באוק׳ 2019 ב-14:37 מאת ‪Jan Dittrich‬‏ <‪
> > jan.dittr...@wikimedia.de‬‏>:‬
> >
> > > Hello  researchers,
> > >
> > >  A lot of research on Wikipedia is published in English and also uses
> the
> > > English Wikipedia as source of data or researchers get their
> participants
> > > via English Wikipedia [0].
> > >
> > > A frequent criticism I meet when discussing such research with
> non-en.wp
> > > community members is that their Wikipedia is different and the results
> of
> > > en.wp base research are problematic/incomparable/totally useless.
> > >
> > > So I want to ask:
> > > - Do you know of research comparing different Wikis, preferably across
> > > language versions? [1]
> > > - How would you deal with such criticism, particularly of the "if it is
> > not
> > > about 'my' wp it is useless"-kind [2]?
> > >
> > > Kind Regards,
> > >  Jan
> > >
> > > 
> > > [0] Plausible due to academi fields, particularly Computer Science,
> > > publishing mainly in english, size and WMF as actor being US-based.
> > > [1] I know of »revisiting "The Rise and Decline" in a Population of
> Peer
> > > Production Projects« (https://dl.acm.org/citation.cfm?id=3173929),
> > > comparing different Wikia-Wikis; Research like "limits of
> > > self-organization" (https://firstmonday.org/article/view/1405/1323)
> that
> > > refer to general principles of peer production. Comparisons of
> Wikipedias
> > > across languages and the impact of their different contexts, languages
> > and
> > > regulations would be very interesting to me.
> > > [2] I'm aware that making heterogeneous things comparable is seen as a
> > core
> > > academic/scientific activity in STS research (Law, SL Star, Turnbull…)
> > so I
> > > do not want to say, transfer to a different setting is not a problem –
> > but
> > > it is certainly not "totally useless" either.
> > >
> > > --
> > > Jan Dittrich
> > > UX Design/ Research
> > >
> > > Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> > > Tel. (030) 219 158 26-0
> > > https://wikimedia.de
> > >
> > > 

Re: [Wiki-research-l] Generalizability of research across different language versions

2019-10-03 Thread Lucie Kaffee
Just adding a small point I saw while interviewing editors of different
language Wikipedias: I believe (and haven't further investigated, so take
this with a grain of salt) that there is also a general difference in the
behavior of "small" and "large" communities, e.g., in trust between the
editors and how they work together. This seemed to be independent of other
cultural context, but this is rather anecdotal and would be interesting to
see further investigated.
I find it generally a very interesting topic and look forward to what
results from the discussion here, so far I see research only applying their
methods across Wikipedias rather than drawing conclusion from one language
version to another.
Thanks Isaac also for the collection of reading material :)

On Thu, Oct 3, 2019, 16:23 Amir E. Aharoni 
wrote:

> Thanks a lot for bringing this up.
>
> Sorry for not offering a solution, but I do want to mention a
> frequently-missed aspect of the problem: Wikis in different languages have
> some differences that are understandable because they reflect some
> objective cultural characteristics of the people who speak it. But some
> differences are artificial and exit because in the early days of Wikimedia
> (mid-2000s) there were no convenient ways for wikis to communicate and
> share info. There were no global accounts and no convenient translation
> tools.
>
> Templates are still not global, even though there is huge demand for it,[1]
> and a lot of community process are implemented using templates: requests
> for deletion, requests for unblocking, article sorting for WikiProjects,
> stub sorting. Many of these things could be unified, at least partially, by
> making templates global, and among many benefits, it would make research
> easier, too.
>
> [1] It came at #3 in the Community Wishlist vote in 2015, and at #1 in
> 2016. Despite this demand, it was not implemented :(
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
>
> ‫בתאריך יום ד׳, 2 באוק׳ 2019 ב-14:37 מאת ‪Jan Dittrich‬‏ <‪
> jan.dittr...@wikimedia.de‬‏>:‬
>
> > Hello  researchers,
> >
> >  A lot of research on Wikipedia is published in English and also uses the
> > English Wikipedia as source of data or researchers get their participants
> > via English Wikipedia [0].
> >
> > A frequent criticism I meet when discussing such research with non-en.wp
> > community members is that their Wikipedia is different and the results of
> > en.wp base research are problematic/incomparable/totally useless.
> >
> > So I want to ask:
> > - Do you know of research comparing different Wikis, preferably across
> > language versions? [1]
> > - How would you deal with such criticism, particularly of the "if it is
> not
> > about 'my' wp it is useless"-kind [2]?
> >
> > Kind Regards,
> >  Jan
> >
> > 
> > [0] Plausible due to academi fields, particularly Computer Science,
> > publishing mainly in english, size and WMF as actor being US-based.
> > [1] I know of »revisiting "The Rise and Decline" in a Population of Peer
> > Production Projects« (https://dl.acm.org/citation.cfm?id=3173929),
> > comparing different Wikia-Wikis; Research like "limits of
> > self-organization" (https://firstmonday.org/article/view/1405/1323) that
> > refer to general principles of peer production. Comparisons of Wikipedias
> > across languages and the impact of their different contexts, languages
> and
> > regulations would be very interesting to me.
> > [2] I'm aware that making heterogeneous things comparable is seen as a
> core
> > academic/scientific activity in STS research (Law, SL Star, Turnbull…)
> so I
> > do not want to say, transfer to a different setting is not a problem –
> but
> > it is certainly not "totally useless" either.
> >
> > --
> > Jan Dittrich
> > UX Design/ Research
> >
> > Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> > Tel. (030) 219 158 26-0
> > https://wikimedia.de
> >
> > Unsere Vision ist eine Welt, in der alle Menschen am Wissen der
> Menschheit
> > teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
> > https://spenden.wikimedia.de
> >
> > Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> > Körperschaften I Berlin, Steuernummer 27/029/42207.
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] Generalizability of research across different language versions

2019-10-03 Thread Amir E. Aharoni
Thanks a lot for bringing this up.

Sorry for not offering a solution, but I do want to mention a
frequently-missed aspect of the problem: Wikis in different languages have
some differences that are understandable because they reflect some
objective cultural characteristics of the people who speak it. But some
differences are artificial and exit because in the early days of Wikimedia
(mid-2000s) there were no convenient ways for wikis to communicate and
share info. There were no global accounts and no convenient translation
tools.

Templates are still not global, even though there is huge demand for it,[1]
and a lot of community process are implemented using templates: requests
for deletion, requests for unblocking, article sorting for WikiProjects,
stub sorting. Many of these things could be unified, at least partially, by
making templates global, and among many benefits, it would make research
easier, too.

[1] It came at #3 in the Community Wishlist vote in 2015, and at #1 in
2016. Despite this demand, it was not implemented :(

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬


‫בתאריך יום ד׳, 2 באוק׳ 2019 ב-14:37 מאת ‪Jan Dittrich‬‏ <‪
jan.dittr...@wikimedia.de‬‏>:‬

> Hello  researchers,
>
>  A lot of research on Wikipedia is published in English and also uses the
> English Wikipedia as source of data or researchers get their participants
> via English Wikipedia [0].
>
> A frequent criticism I meet when discussing such research with non-en.wp
> community members is that their Wikipedia is different and the results of
> en.wp base research are problematic/incomparable/totally useless.
>
> So I want to ask:
> - Do you know of research comparing different Wikis, preferably across
> language versions? [1]
> - How would you deal with such criticism, particularly of the "if it is not
> about 'my' wp it is useless"-kind [2]?
>
> Kind Regards,
>  Jan
>
> 
> [0] Plausible due to academi fields, particularly Computer Science,
> publishing mainly in english, size and WMF as actor being US-based.
> [1] I know of »revisiting "The Rise and Decline" in a Population of Peer
> Production Projects« (https://dl.acm.org/citation.cfm?id=3173929),
> comparing different Wikia-Wikis; Research like "limits of
> self-organization" (https://firstmonday.org/article/view/1405/1323) that
> refer to general principles of peer production. Comparisons of Wikipedias
> across languages and the impact of their different contexts, languages and
> regulations would be very interesting to me.
> [2] I'm aware that making heterogeneous things comparable is seen as a core
> academic/scientific activity in STS research (Law, SL Star, Turnbull…) so I
> do not want to say, transfer to a different setting is not a problem – but
> it is certainly not "totally useless" either.
>
> --
> Jan Dittrich
> UX Design/ Research
>
> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. (030) 219 158 26-0
> https://wikimedia.de
>
> Unsere Vision ist eine Welt, in der alle Menschen am Wissen der Menschheit
> teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
> https://spenden.wikimedia.de
>
> Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Generalizability of research across different language versions

2019-10-03 Thread Isaac Johnson
Jan,
You bring up a good point. I feel like there has been a gradual shift
towards research across multiple language communities over the past few
years and that is starting to lead to some informal insights into this
question of transfer of findings across languages / cultures. First a few
examples in case you wish to explore yourself:
* Motivation / needs of Wikipedia readers across 13 different languages:
https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Prevalence_of_Wikipedia_use_cases
* Motivation / behavior of new editors across Czech and Korean Wikipedias
(with some ongoing work in Vietnamese and Arabic Wikipedia as well I
believe):
https://www.mediawiki.org/wiki/Growth/Analytics_updates/EditorJourney_initial_report
* Reading time across many wikis:
https://dl.acm.org/citation.cfm?doid=3306446.3340829
* Predicting aggregate page view in languages/regions where the takeaway
was that language was more important than geographic region when it comes
to predicting page views:
http://wikiworkshop.org/2019/papers/Wiki_Workshop_2019_paper_3.pdf
* Enabling page previews in English / German:
https://www.mediawiki.org/wiki/Page_Previews/2017-18_A/B_Tests
* Usage of the "Thanks" feature across a number of languages:
https://meta.wikimedia.org/wiki/Research:Understanding_thanks
* Effect on tourism of additional content about places in Dutch, German,
French, and Italian:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3046400
* Some data on the usage of blocks on various wikis:
https://meta.wikimedia.org/wiki/Community_health_initiative/Measuring_the_effectiveness_of_blocks
* A bunch of data on the prevalence of anonymous editing on different
languages / projects:
https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/Research
* Scott Hale has also done some work on multilingual editing that might be
worth exploring: https://arxiv.org/pdf/1501.00657v2.pdf and
https://arxiv.org/abs/1312.0976
* Statistics on content overlap across wikis:
https://wikitech.wikimedia.org/wiki/Wikidata_Concepts_Monitor#Wikidata_usage_patterns
or
https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_articles_across_languages/Inter_language_approach#Article_Alignment

In general, my personal views are:
* Language (presumably partially as a proxy for culture) is by far the most
salient aspect when it comes to understanding differences in behavior etc.
across wikis
* I hesitate to make broad statements about cultural differences, but there
are certain wikis that are more / less interconnected. For instance, there
is a good bit of overlap between Hindi Wikipedia and various other language
editions associated with India (Gujarati, Marathi, etc.). Same is true for
various languages in Spain (Asturian, Basque, Catalan, Spanish, etc.) and
Ukrainian / Russian Wikipedias.
* Obviously size matters a lot too in certain cases when it comes to
editing / maintenance workflows, though I would argue it's less of a factor
when it comes to reader behavior.
* Some wikis do have a reputation of being quite distinct -- for instance,
I would be hesitant to generalize anything to/from Japanese Wikipedia
because the statistics regarding interactions etc. there often look much
different than other wikis.

I would love to see some meta analyses that begin to look at similarities
in behavior or settings (e.g., AbuseFilters, rules around
ContentTranslation) across lots of different metrics to guide our
understanding of the similarities and differences between the language
communities. Until then, I would say there are going to be instances when
research on one wiki tells you little about how another wiki would react
(+1 to what Nemo says about even two is much much better than one language,
especially if they are much different languages/cultures). But there are
also often statistics you might pull up to make inferences around how
findings might transfer -- e.g., statistics on anonymous editing and
reverts might tell you something about how introducing new types of IP
blocks would play out in a new community.

On Wed, Oct 2, 2019 at 10:46 AM Federico Leva (Nemo) 
wrote:

> Jan Dittrich, 02/10/19 14:35:
> > - How would you deal with such criticism, particularly of the "if it is
> not
> > about 'my' wp it is useless"-kind [2]?
>
> At a minimum, the research needs to have used methods which could extend
> to multiple wikis. Being about 2 languages is ten times better than
> being about 1 only, while being about 100 language subdomains is not a
> hundred times more informative. Involving multiple languages helps go
> beyond the language-specific and wiki-specific constructs (like
> templates, workflows etc.).
>
> Federico
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


-- 
Isaac Johnson -- Research Scientist -- Wikimedia Foundation

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 169, Issue 12

2019-10-03 Thread Ludovic Bocken
Hello,

Several publications are in preparation... I will get back to you...

Thanks for your interest,

Ludovic BOCKEN
lboc...@gmail.com
www.ludovicbocken.com
Skype: ludovic.bocken
http://www.linkedin.com/in/ludovicbocken
 Rue Hochelaga,
Montréal, QC H2K 4N8
+1 (514) 649 0755

*Avis de confidentialité*

Le présent message transmis par télécopie est confidentiel, et son contenu
peut être protégé par le secret professionnel. Il est à l’usage exclusif de
son ou sa destinataire. Toute autre personne est par les présentes avisée
qu’il lui est strictement interdit de le diffuser, de le distribuer ou de
le reproduire. Si la ou le destinataire ne peut être joint ou vous est
inconnu, nous vous prions d’en informer immédiatement l’expéditeur ou
l’expéditrice et de détruire ce message et toute copie de celui-ci.



Le sam. 28 sept. 2019 à 08:00, 
a écrit :

> Send Wiki-research-l mailing list submissions to
> wiki-research-l@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>1. Re: Standardization of Wikipedia articles according to the
>   lexical constancy of their introductions and body texts (Morten Wang)
>
>
> --
>
> Message: 1
> Date: Fri, 27 Sep 2019 07:47:00 -0700
> From: Morten Wang 
> To: Research into Wikimedia content and communities
> 
> Subject: Re: [Wiki-research-l] Standardization of Wikipedia articles
> according to the lexical constancy of their introductions and body
> texts
> Message-ID:
>  6...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Ludovic,
>
> This work sounds interesting, I'm looking forward to learning more about it
> as your papers come out!
>
> I read through the post on LinkedIn and from how I interpret it you are
> only looking at two quality classes (Features Articles vs other articles).
> This seems somewhat odd to me and I'd like to know more about why? The
> current trend when it comes to predicting article quality in the English
> Wikipedia does not limit the prediction problem to just FAs vs the rest,
> instead it's using the whole quality scale[1]. See the list below for some
> papers along this line of research.
>
> I'm also really curious about what "standardize the cognitive accessibility
> of Wikipedia" means? That might mean more than just "article quality",
> hence why I'm asking.
>
> All that being said, I think the approach sounds interesting and probably
> adds some signal, so I'm curious to learn more how it works and performs.
>
> References:
>
>- Warncke-Wang, M., Cosley, D., & Riedl, J. Tell me more: an actionable
>quality model for Wikipedia. OpenSym/WikiSym 2013. [We argue that
> metadata
>isn't useful because contributors can't change it]
>- Warncke-Wang, M., Ayukaev, V. R., Hecht, B., & Terveen, L. G. The
>success and failure of quality improvement projects in peer production
>communities. CSCW 2015. [See the Appendix for details of the improved
> model
>and how to get good training data]
>- https://www.mediawiki.org/wiki/ORES builds upon the 2015 paper and is
>a readily accessible API, reference datasets are available on figshare
><
> https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406
> >
> and
>also in the GitHub repository
>. Now the benchmark to
>compare against, as in the three other papers listed below.
>- Dang, Q. V., & Ignat, C. L. Measuring quality of collaboratively
>edited documents: the case of Wikipedia. CIC 2016. [Shows that adding
>readability features can improve predictions]
>- Dang, Q. V., & Ignat, C. L. An end-to-end learning solution for
>assessing the quality of Wikipedia articles. OpenSym 2017. [Shows the
>performance of RNNs, also contains an important discussion of
> performance,
>interpretability, etc]
>
> I also came across this recent paper by Schmidt and Zangerle that reports
> significant improvements, but haven't yet had the time to read the paper
> closely:
>
>- Schmidt, M., & Zangerle, E. Article quality classification on
>Wikipedia: introducing document embeddings and content features. OpenSym
>2019.
>
> Footnotes:
>
>1. Typically without A-class articles due to how few of them they are.
>
>
> Cheers,
> Morten
>
> On Mon, 23 Sep 2019 at 13:09, Ludovic Bocken  wrote:
>
> > Hello,
> >
> > I am finishing my PhDs and I think that you could be interested in my