Hoi Dario, You ask if I want to help. <grin> I do and, I have things to give and I have things to ask, so let us do a bit of both for best effect </grin>
On research data. Much of the research data has equivalent information in Wikidata. When you research for gender diversity for instance, articles are identified to be about "human" and "sex, gender". Where Wikidata does NOT have that information, it should be updated as a matter of principle. The reason is that with such an update in Wikidata the information for other languages through the "inter language links" will grow the gender information for other languages as well. This enables the same analysis to some extend for those other languages. When you need to query Wikidata, WDQ the tool that does query Wikidata for many, many months was updated today and it allows you to query on the "qualifiers" as well [1]. This is why there is an argument to be made to use Wikidata for data analysis and research exclusively. In the previous research newsletter, the research on Wikidata and interwiki links between English and Portuguese Wikipedia was largely dismissed because "Wikidata had changed the game". Wikidata does not change the game when you compare only between two languages. What I think I observe in Wikidata is that there are fewer people working on inter language links, not more. I also notice that the number of Wikipedia articles without an item in Wikidata is growing. We have had bots run on the Indian Wikipedias to add items and they took surprisingly long to run. When you consider " gender diversity" for instance as a subject for research, what I observe is that the same research is repeated and repeated again. For me it hardly qualifies as relevant; when using WDQ you can have up to date information whenever you want it. It start to qualify for me when it states that the baseline had a percentage and a number of males/females combined with a moment where the percentage has changed and the number of males/females identified have changed. When you want to research a specific language, any language at that, all articles need to be represented with an item. It is best Wikidata practice anyway. The way to work is then to first set the base line, get the numbers that are relevant to the research and then do the analysis on the raw data (ie Wikipedia) this results in updates in Wikidata and this allows for the same queries to be run to understand what the numbers mean. Yes, I do understand that you make use of subsets of data to do research. It just happens that WDQ uses its own database that gets updated from Wikidata. It would be totally unreasonable to think that this database cannot be manipulated. Also you can have your own instances of this database and have WDQ run on that (you will be the first one to actually try this but hey this is research). So yes, you can preserve your dataset and yes you can compare it to what happens in the wild (ie outside of the chosen subset as well). When you research the smaller languages, their needs and their coverage, you have to appreciate that English cannot be the yardstick to measure by. The rest of the world uses meters and, en.wp does not even cover 50% of the subjects that are known to Wikidata. The WMF does know what people search for and do not find. That is to say, the numbers exist but are not available for analysis. When you rank them, you learn what people are looking for. Making Wikidata items out of them is the quickest way to provide initial information for that language and on that subject when "Wikidata search" is enabled on a Wikipedia. Dario, this is actionable information that we do not have. Research that leads to actionable results is imho the most relevant research. As to studying things to death, given that en.wp is what research is about, the numbers are only relevant to the extend that en.wp is relevant. My point is very much that its relevance is decreasing in favour of all the other languages. The consequence is that investments that are en.wp centred do not have the effect that is expected elsewhere. Investment in other languages, cultures and countries are likely to have a bigger return on investment. Particularly when the investments, the research is about stimulating growth and growth. Thanks, GerardM [1] http://magnusmanske.de/wordpress/?p=178 On 7 March 2014 01:46, Dario Taraborelli <dtarabore...@wikimedia.org> wrote: > Hoi Gerard, > > thanks for the gigantic list of questions – comments inline > > On Mar 6, 2014, at 12:17 AM, Gerard Meijssen <gerard.meijs...@gmail.com> > wrote: > > Hoi Dario, > When you look at the statistics [1], you find that the number of page > views in English is going down faster than in the other languages combined. > You also find that the percentage of readers for the top ten Wikipedias in > size is slowly but surely decreasing (now at 88.94%). How can we decrease > this percentage even more without sacrificing the number of page views for > the top 10? > > > I guess you saw our report on 2013 traffic trends [1], page views have > been following a downward trend in 2013, but unique visitors as measured > via comScore have been steadily growing over time and we have no evidence > to date of a change in that trend, after controlling for seasonality. We > are working with the analytics engineers to have more reliable data about > traffic to be able to accurately answer these questions, including > breaking down readership trends by country, project, device and source. > > [1] https://www.mediawiki.org/wiki/File:2013_Wikimedia_traffic_trends.pdf > > Has there been any research in how we can stimulate the growth in > Wikipedias that are not part of the top 10%. Do we know to what extend the > English Wikipedia model works for these other languages or is a hindrance. > Do we know what people are looking for in the smaller Wikipedias and do we > know what they do / do not find. Do we know how people find articles in > those languages, does this work in the same way as it does for English? Is > it possible that we have to cultivate contacts with the local “Googles" in > order to grow attention for what we have to offer. > > > speaking for Analytics/Research & Data, we haven’t done a lot of original > research let alone experimentation on small Wikipedias. I expect request > logs and search logs will provide useful data to understand how people find > articles on these projects. > > Do we know what the effect is of the new search engine that is much better > at providing results in other scripts? Do we know to what extend inter > language links are created and, do we know how this has changed since the > move to Wikidata? Dario, can you please tell us to what extend the other > languages are studied at all? Do we know what effect they have? Do we know > about the experience of these Wikipedias locally? Do we care about the > typography in other scripts? Do we know about the NPOV in the small > projects? Do we know about gender diversity in the smaller languages. How > about cultural bias and how does this compare to the cultural bias in the > big projects? Dario there is so much that we do not know, have not touched. > > > amen to that. > > Why study more of what has been studied to death? > > > I am not sure I understand your question, but if you are suggesting that > we need to find better ways to pitch unexplored research to the wiki > research community I am down with that. It’s sad that we haven’t found a > good model to create a speed dating system to match research questions and > researchers, but many people on this list as well as those who served on > the research committee have expressed a lot of interest in fixing this > problem. Do you want to help and do you have any example of strategies that > you think might be successful? > > Dario > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l