Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-11 Thread Gerard Meijssen
Hoi, One more thing to consider is the possibility to generate articles on the fly based in information in Wikidata. This is already done in the "Reasonator" and it functions with differing results for 2,225,364 items. In essence it is a small script that can be translated in other languages. Obvio

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-11 Thread Stuart A. Yeates
Thinking about this, en.wiki has an interesting structure https://en.wikipedia.org/wiki/Category:Redirects_from_non-English-language_terms Maybe if you could measure the usage of these redirects by language, you could estimate the relative population size of the wiki-using speakers of that language

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-11 Thread Felipe Ortega
> > De: Han-Teng Liao (OII) >Para: Research into Wikimedia content and communities > >Enviado: Martes 8 de julio de 2014 9:27 >Asunto: [Wiki-research-l] Constructing sensible baselines for Wikipedia >language development analy

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Kerry Raymond
One thing that troubles me slightly with this conversation is that I think there is a presumption that people will naturally choose to read and write Wikipedia in their native language, but that isn't necessarily so. Anecdotally it seems many people read English Wikipedia because precisely it

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Laura Hale
On Wed, Jul 9, 2014 at 1:11 PM, Heather Ford wrote: > So I went to the 'verifiability' articles in a few different languages to > check whether there is consensus about this on Wikipedia, at least. The > english version [1] states that a) english language sources are preferred > because it's the

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Heather Ford
This is such a great discussion. Thanks for starting it, Hang-teng :) Laura, I just loved your analysis. Makes me realize that I spend way too much time thinking about these things rather than practicing them which is what you showed in your rapid analysis :) One thing that I was really intereste

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Federico Leva (Nemo)
h, 08/07/2014 13:49: > This should also help sociolinguists to identify which languages > [...] that > are more developed than others in the Wikipedia sphere, and seeks > explanations for their relative success/failure by contrasting the > Wikipedia sphere and offline/online sphere. Agreed on the

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Jane Darnell
han-teng liao, Sorry but I had to read your answer a couple times before I understood what you were getting at. I missed the previous conversation also. For information about the 10,000 things, I would just go to GerardM because he knows all about that stuff. As far as page stats on all the project

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread h
(on Laura Hale's pilot study of measuring quality across several languages used in Spain) Laura, I enjoy reading the report on your blog post, which also takes also the quantified approach to measuring quality. If I did not misread your blogpost, you incorporated the measurement of general qualit

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread h
Indeed, GerardM, I agree with you that a few good women or men with passions can kick start some Wikimedia projects, and different Wikimedia projects have different barriers or paths of development. I also agree with you that the direction that I am pursuing may not be helpful to those languages i

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread h
(user language log: e.g. Accept-Language parameter) Yes Stuart, locale data could be a nice source to look at, including the HTTP headers of the Accept-Language to find locale such as " zh-TW,zh;q=0.8,en;q=0.6" Do you or anyone have suggestions on the external or global datasets that can be used

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Gerard Meijssen
Hoi, At the WMF language committee, the question if a language is viable for a Wikimedia project is a practical one. It is also very much a political one. One vitally important difference with your approach is that the distinction is between a first project and a subsequent project. In the latest i

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Stuart A. Yeates
Web browser language settings are an obvious place to start this. This will give you an approximation of user's preferred language (more likely the preferred language of those who configured their software). See http://www.w3.org/International/questions/qa-lang-priorities.en.php for the gory detail

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Laura Hale
I more or less tried to have a go at this on http://wikinewsreporter.wordpress.com/2014/06/30/determining-the-relative-quality-of-one-wikipedia-project-to-another-one-approach-with-english-spanish-catalan-galician-argonese-and-euskera-wikipedias/ using both internal and external criteria for determ

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Han-Teng Liao (OII)
Thanks Jane for the comments and suggestions. Correct me if I misread your comments/suggestions, Jane. (1) Did you suggest measurements that are observable *inside* Wikipedia/Wikimedia websites? (2) If so, does it mean that your suggestion of measuring the current state of a language version as "

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Jane Darnell
Well as I see it, the state of any language version is a combination of the state of its content and community. Going back to the zero-state, in order to have permission to start a language version, there must be a "list of 10,000 important topics" that has to be registered somewhere (sorry, no ide

[Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Han-Teng Liao (OII)
Dear all, Your suggestions are needed on the ways in which one can construct some sensible baselines, most likely based on data sets *external* to Wikipedia projects, of *expected* Wikipedia language versions development. Such baselines should ideally indicate, given the availability o