[Wiki-research-l] New viz.: Wikipedias, participation per language
Hi all, I just published a new visualization: Wikipedias, compared by participation per language (= active editors per million speakers) There are several pages, one for a global overview https://stats.wikimedia.org/wikimedia/participation/d3_participation_global.html one with breakdown by continent https://stats.wikimedia.org/wikimedia/participation/d3_participation_continent.html You can also zoom in on one continent, by clicking on it Any feedback is welcome. Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikimedia Commons data structure - public?
Hi Trilce, There is new set of dumps for every Wikimedia wiki at least once a month. Among those files are several database dumps in xml format. One with the most recent version of every article, one with meta data but no article texts ('stub dumps'). One with full texts for every revision of every article. Here is the latest set for Commons: https://dumps.wikimedia.org/commonswiki/20180701/ I hope this helps, Cheers, Erik On Tue, Jul 17, 2018 at 1:52 PM Trilce Navarrete wrote: > Dear all, > > I am wondering if the Wikimedia Commons data structure (ideally in XML) as > well as the documentation thereof and sample data is something that one > could find online. > > There is a team at ICS FORTH who have developed a mapping technology > called X3ML which allows declarative mappings between two data structures. > The idea would be to map the Wikimedia Commons data structure to the CIDOC > CRM, meant for heritage content users. > > Where could I try to find the Wikimeida Commons data structure? or who may > I ask further on this matter? > > thank you much in advance for any tips ! > best > Trilce > > -- > :..::...::..::...::..: > Trilce Navarrete > > m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete > w: trilcenavarrete.com > ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] New files for geo coded Wikimedia stats
Today I released two new json files [2][4]. Both complement visualization 'Wikipedia Views Visualized' [1] (aka WiViVi), but both can be useful in other contexts as well. 1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted from harvesting World Bank API files. It contains yearly figures for four metrics: (more could be added rather easily): - population counts, - percentage internet users, - percentage mobile subscriptions, - GDP per capita. The following static demographics charts on meta are also based on these metrics: [3] 2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather complex (*) csv files which feed WiViVi. This brings together demographics data and pageviews (by country, by region, and by language), and also adds additional meta info. This json file is meant for external use, as it's much easier to parse than the 3 csv files WiViVi uses itself [5]. (*) complex , as the csv files use a hierarchy based on nested delimiters -- Details: World Bank files have different formats (some csv, some json) and use a variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3). Script 1) first does normalization, then data are aggregated, filtered, indexed. Json file 1) replaces two csv files which up to now were filled from Wikipedia pages [6][7]. Also, although Wikipedia lists nowadays also use World Bank data, this is not consistently done, see [8][9]. [1] Viz: https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html [2] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/world-bank-demographics.json Script: https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank [3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics [4] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json Script: https://github.com/wikimedia/analytics-wikistats/tree/master/traffic [5] Syntax: https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html [6] Article: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population [7] Article: https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users [8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank population counts' [9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank internet percentages' ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] new viz. WiViVi = Wikipedia Views Visualized
Dear all, A new visualization has just been published: WiViVi = Wikipedia Views Visualized https://stats.wikimedia.org/wikimedia/animations/pageviews/wivivi.html documented at https://meta.wikimedia.org/wiki/WiViVi Please let me know if you have any feedback or questions. Thanks, Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Wiki Loves Monuments 2016 stats
New stats are available for Wiki Loves Monuments 2016 contest http://infodisiac.com/blog/2017/01/wiki-loves-monuments-2016/ Charts also on https://commons.wikimedia.org/wiki/Category:Wiki_Loves_Monuments_2016_stats Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikipedia video stats ?
There is work being done towards front-end for media count files. Step one completed: at least the counts are in a database now, albeit only some columns. https://phabricator.wikimedia.org/T116363 Erik -Original Message- From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Daniel Mietchen Sent: Friday, November 04, 2016 3:40 To: Research into Wikimedia content and communities Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Wiki-research-l] Wikipedia video stats ? once I had the link open, I actually had a look at it in "Show details" mode and was surprised to find not a single .ogg or .ogv file listed amongst the top 1k files. Seems like they're counted as image files by the MIME type filter: when I selected the "image" box, a good number of them popped up in the list. On Fri, Nov 4, 2016 at 3:07 AM, Daniel Mietchen wrote: > If you use > https://commons.wikimedia.org/wiki/Category:Videos > with GLAMorous (after unselecting the image and audio MIME types), it > gives some basic usage data across wikis, though no view stats: > https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Videos&use_globalusage=1&ns0=1&depth=15&projects[wikipedia]=1&projects[wikimedia]=1&projects[wikisource]=1&projects[wikibooks]=1&projects[wikiquote]=1&projects[wiktionary]=1&projects[wikinews]=1&projects[wikivoyage]=1&projects[wikispecies]=1&projects[mediawiki]=1&projects[wikidata]=1&projects[wikiversity]=1 > > On Thu, Nov 3, 2016 at 9:11 PM, Trilce Navarrete > wrote: >> Dear Tilman, thanks much for this ! very helpful. Though it is not a number >> I can use right away, it is a very nice invitation to further explore the >> potential. Will be sending the paper back to the list when ready :) >> >> again, thanks much ! >> best >> T >> >> On Thu, Nov 3, 2016 at 8:52 PM, Tilman Bayer wrote: >>> >>> Hi Trilce, >>> >>> some data exists about video views, although it's AFAIK not available >>> in form of a nice online tool. See >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts >>> >>> On Mon, Oct 31, 2016 at 5:34 AM, Trilce Navarrete >>> wrote: >>> > Dear all, >>> > >>> > I'm doing some research on the use of image and video in Wikipedia and >>> > would >>> > like to know if there is any way to track # of video views in Wikipedia >>> > articles ? >>> > >>> > Image view per page I use the GLAM tools, but for video, I'm not sure if >>> > there is a tool or general Wikipedia stat on # of videos currently used >>> > in >>> > all languages, # of Wikipedia articles containing video and # of views >>> > to >>> > this pages. >>> > >>> > I understand use of video online is exploiting, and wondered if the wiki >>> > had >>> > stats on this as well. >>> > >>> > your feedback will be most appreciated ! >>> > thanks much in advance >>> > Trilce >>> > >>> > -- >>> > :..::...::..::...::..: >>> > Trilce Navarrete >>> > >>> > m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete >>> > w: trilcenavarrete.com >>> > >>> > ___ >>> > Wiki-research-l mailing list >>> > Wiki-research-l@lists.wikimedia.org >>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> > >>> >>> >>> >>> -- >>> Tilman Bayer >>> Senior Analyst >>> Wikimedia Foundation >>> IRC (Freenode): HaeB >>> >>> ___ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> >> >> >> -- >> :..::...::..::...::..: >> Trilce Navarrete >> >> m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete >> w: trilcenavarrete.com >> >> ___ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Multi year page views statistics
New phab request: https://phabricator.wikimedia.org/T139934 Erik -Original Message- From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Federico Leva (Nemo) Sent: Monday, July 11, 2016 15:29 To: avnerkan...@gmail.com; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Multi year page views statistics Avner Kantor, 11/07/2016 13:43: > Can it be done by https://tools.wmflabs.org/pageviews No. https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Updates_and_backfilling > or any other tool? Sure. Preferably by using https://dumps.wikimedia.org/other/pagecounts-ez/ , but most people end up getting JSON from http://stats.grok.se/ Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on education
Here are all 96610 subcategories of Education, with 2.6 million articles. The problem is sometimes one unexpected subcategory can draw in lots of unexpected content, and the most viewed article can thus be totally off-topic. I could do some iterations and prune the tree into something more manageable, by blacklisting weird subbranches. https://stats.wikimedia.org/wikimedia/pageviews/categorized/wp-en/2016-02/categories_wp-en_cat_Education_2016-02.html Erik Zachte From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Leila Zia Sent: Thursday, April 21, 2016 23:13 To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on education John, I played with Wikipedia Tools for Google and I'm sure it will do what you're looking for. Check out this <https://docs.google.com/spreadsheets/d/1HeFluqXXcSXw14pk_hceKbuxykNaTjOJMLrNxs81Ifk/edit#gid=0> Google spreadsheet. You just have to repeat a slightly modified formula in columns B and C to get what you have in column D for all subcategories of Education listed in A. You can automate that part, too. L On Thu, Apr 21, 2016 at 12:39 PM, john cummings wrote: Hi Leila Thanks very much, what I need to be able to do is get all the articles within the category and subcategories of Category:Education and then get page views for all of them, its a lot of pages.. My friend Ed Saperia created a spreadsheet to do this but unfortunately the query API limits to a few 100 articles so its not possible to run the query through that. Any other suggestions would be very much appreciated. Thanks John On 21 April 2016 at 18:54, Leila Zia wrote: Hi John, Two comments: * Have you tried Wikipedia Tools for Google <https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en> ? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want. * If you are looking for having a list of articles related to Education that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en <http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Education> &t=fr&n=10&article=Education gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here <https://github.com/ewulczyn/translation-recs-app/tree/master/api> . Hope this helps. Best, Leila Leila Zia Research Scientist Wikimedia Foundation On Thu, Apr 21, 2016 at 5:04 AM, john cummings wrote: Hi all I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia. I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries. Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects. Thanks John ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population?
What about users who register without any intention to edit? I expect many people register out of habit, because they expect unspecified benefits. On most sites there are some. And there even are some on our site for read-only users, namely to be able to tweak the user settings (e.g. how links are displayed). Erik Zachte From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of WereSpielChequers Sent: Thursday, May 10, 2012 7:58 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population? Hi Piotr, You might make the assumption that the difference between 4 million and 16 million is largely editors who never get out of userspace, my experience is that such users are relatively rare, or at least won't dominate that 12 million. I'm fairly sure that there will be a number of different groups in that 12 million. Steve Walling, Aaron or Maryana may be able to help analyse or at least explain them. Significant groups in the 12 million will definitely include: 1 People who registered an account and tried but never successfully saved an edit because when they looked they saw a wall of code and they don't do html. The WMF is investing a lot of money in WYSIWYG editing software in the hope that this will enable goodfaith but not very technical people to edit Wikipedia. 2 Vandals since 2007. We have edit filters that are trying to dissuade vandals from saving their first edit because it triggers one of our tests for probably being vandalism. These filters only came in during the last few years and have been improved over time - so they are deterring a significant proportion of recent badfaith editors from ever saving an edit. 3 Visitors from other wikis. One of the features of Single User Login is that if you are logged in and you click on a link that takes you to another wikimedia wiki, your account becomes active at that wiki even if you never go near the edit button. My account is active on 92 wikis and I've edited in rather less than half of them. I won't go into all the reasons why one might visit other wikis, but if you see that an article you've written has equivalents in several other languages I consider it human nature to click on the links and look at the article. Even if you don't use Google translate, the choice of image and the size of the paragraphs is often enough to tell you whether someone has translated your work or started afresh. 4 Editors whose articles have been deleted. About a quarter of new editors start by creating a new article rather than by editing existing articles. A large majority of such articles get deleted and their authors depart. If the 4 million is only measured on surviving edits to article space then there will be many hundreds of thousands whose only article space edits have been deleted. 5 Zombie accounts. We now have programs that prevent people opening accounts that are overly similar to the names of existing editors, but before these filters came in many editors would protect themselves from such impersonation by creating such "zombie accounts" themselves and marking their userpage with a link to their main account. 6 Edit conflicts. Breaking news stories attract editors like moths to flames, our article on Sarah Palin peaked at 25 edits per minute at one point during the day she became John McCain's running mate (I don't think anyone logs the number of edit conflicts). If you are a newbie trying to edit a trending article by using that edit button on the top of the page then you are guaranteed to get frustrated and leave. The regulars have learned that busy pages are best edited one section at a time, and on a very busy page there simply isn't time to edit the whole page before a section edit is saved. Of course that could be easily resolved by disabling whole page editing on busy pages, but I'm not expecting that anytime soon. Another issue is that I believe that the 4 million are people who have one undeleted edit to mainspace on the English Wikipedia since December 2004. If so the 16 million may include those who haven't edited since December 2004. I'm probably missing a few other variables, I'm afraid this is a complex area, but I hope this gives you an idea of the problem. WSC On 10 May 2012 16:35, Piotr Konieczny wrote: Thanks for the link. The figure 4,058,477 you cite (from http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution), as you note, comes with the warning that "Only article edits are counted, not edits on discussion pages, etc". I assume this is why the magic word NUMBEROFUSERS at en Wikipedia returns 16,763,691 (numerous low activity editors apparently make their few edits outside article mainspace). The breakdown I could live with, for a while, but t
Re: [Wiki-research-l] wikitrends
> Yes, this is clearly something that needs to be done :-) Totally cool! > I think jumbling them all together would make things a bit less interesting. I agree. > Would it be best to have a drop down that lets you select the project? A drop down for 280 Wikipedia's would be somewhat hard to navigate. What about a front page with all major projects (Commons,Wikibooks,Wikinews,Wikipedia,Wikiquote,Wikisource,Wikiversity,Wikti onary,Other projects) sorted by name or total requests in the past hour, each linking to an overview page for one project. On that second page (or below that first list) you could list all languages for that project, sortable by name or by total requests in the past hour, each linking to a page like you have now. Instead of a long sortable table you could have a swappable index, like e.g. http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm Erik -Original Message- From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed Summers Sent: Tuesday, February 21, 2012 6:01 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] wikitrends Yes, this is clearly something that needs to be done :-) Would it be best to have a drop down that lets you select the project? I think jumbling them all together would make things a bit less interesting. //Ed On Tue, Feb 21, 2012 at 10:51 AM, Erik Zachte wrote: > Awesome! > > Followed by the obligatory "Could you please also " ;-) > > In this case the dots stand for "add pages other Wikipedia wikis, > ideally also for other sister projects?" > All data are in the same file you use already. > > Best, Erik Zachte > > > > > > -Original Message- > From: wiki-research-l-boun...@lists.wikimedia.org > [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed > Summers > Sent: Tuesday, February 21, 2012 9:36 AM > To: Research into Wikimedia content and communities > Subject: [Wiki-research-l] wikitrends > > I imagine something like this has already been done before, but I > thought I would mention it as a curiosity: > > Wikitrends > http://inkdroid.org/wikitrends/ > > Wikitrends is a display of the top 25 view articles on English > Wikipedia in the latest hour. It relies on stats that Wikimedia make > available [1]. If you hover over the article you should get the > article summary (courtesy of the MediaWiki API), and there are canned > search links of realtime Google and Twitter and Facebook search if you > want to look at what people might be saying about the topic. > > I put the code up on Github [2] and wrote a brief blog entry about the > process of putting the app together. The punchline that I was trying > to work up to is that it is truly wonderful that Wikimedia makes an > effort to make its data assets available on the Web, both via an API and as bulk downloads. > It is a great role model for other organizations and institutions. > > Thanks! > //Ed > > [1] http://dumps.wikimedia.org/other/pagecounts-raw/ > [2] http://inkdroid.org/edsu/wikitrends/ > [3] http://inkdroid.org/journal/2012/02/21/nodb/ > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] wikitrends
Awesome! Followed by the obligatory "Could you please also " ;-) In this case the dots stand for "add pages other Wikipedia wikis, ideally also for other sister projects?" All data are in the same file you use already. Best, Erik Zachte -Original Message- From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed Summers Sent: Tuesday, February 21, 2012 9:36 AM To: Research into Wikimedia content and communities Subject: [Wiki-research-l] wikitrends I imagine something like this has already been done before, but I thought I would mention it as a curiosity: Wikitrends http://inkdroid.org/wikitrends/ Wikitrends is a display of the top 25 view articles on English Wikipedia in the latest hour. It relies on stats that Wikimedia make available [1]. If you hover over the article you should get the article summary (courtesy of the MediaWiki API), and there are canned search links of realtime Google and Twitter and Facebook search if you want to look at what people might be saying about the topic. I put the code up on Github [2] and wrote a brief blog entry about the process of putting the app together. The punchline that I was trying to work up to is that it is truly wonderful that Wikimedia makes an effort to make its data assets available on the Web, both via an API and as bulk downloads. It is a great role model for other organizations and institutions. Thanks! //Ed [1] http://dumps.wikimedia.org/other/pagecounts-raw/ [2] http://inkdroid.org/edsu/wikitrends/ [3] http://inkdroid.org/journal/2012/02/21/nodb/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] edit counts for specific users
> If you're producing analyses that call out individual editors, then yes, it would be wise to make such tools opt-in. That makes all the difference. Id also love to see such viz. for my own edits and probably wouldnt mind sharing it. And Im not arguing against mining these data for research. I trust that research will focus on generalized findings, and in an article will provide an example for which consent had been given. My point is rather that if we provide generic tools as a service to the research community the issue of opt-in will sooner or later become mute. Someone will take the tool, add the category cloud, and start wikigossip.com (just checked: domain is reserved) I know this is a general trend anyway, lots of tools already exist that help you analyze someones presence on the web. > But for every Wikipedian who would rather not, there are ten more (like me) that really want more insight into the rich data set of our editing histories. On an aggregate level or secure access level, yes. Not to feed our interpersonal curiosity. Im sure no-one here has that in mind and of course I wasnt implicating such. Just raising awareness of what it could lead to. Erik Zachte From: Steven Walling [mailto:steven.wall...@gmail.com] Sent: Wednesday, March 23, 2011 18:30 To: Research into Wikimedia content and communities Cc: Erik Zachte; afo...@gatech.edu Subject: Re: [Wiki-research-l] edit counts for specific users On Wed, Mar 23, 2011 at 5:46 AM, Erik Zachte wrote: In Wikimania Boston, 2006, visualization experts [1] Fernanda Viégas en Martin Wattenberg presented a tool which could produce a tag cloud from a person's edit history. Tag clouds were a novelty and very suitable for the matter at hand. You could see at a glance that editor Johanna Doe was mainly engaged in articles about say classic music, and Chinese and Iran politics, which is OK of course, but maybe better left to the person to disclose at her own discretion. We discussed implications of the visualization: on one hand this was all data from the public dumps, and anyone could make such a script once the idea spread, on the other hand would it be wise to help facilitate this process. I later found out they decided not to publish the tool for this very reason. [1] See first two entries on http://infodisiac.com/Wikimedia/Visualizations/ Erik Zachte That is really sad. As a Wikipedian, I would hate to see any researcher shy away from publishing interesting and insightful visualizations of public data. If you're producing analyses that call out individual editors, then yes, it would be wise to make such tools opt-in. But for every Wikipedian who would rather not, there are ten more (like me) that really want more insight into the rich data set of our editing histories. Steven ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] edit counts for specific users
In Wikimania Boston, 2006, visualization experts [1] Fernanda Viégas en Martin Wattenberg presented a tool which could produce a tag cloud from a person's edit history. Tag clouds were a novelty and very suitable for the matter at hand. You could see at a glance that editor Johanna Doe was mainly engaged in articles about say classic music, and Chinese and Iran politics, which is OK of course, but maybe better left to the person to disclose at her own discretion. We discussed implications of the visualization: on one hand this was all data from the public dumps, and anyone could make such a script once the idea spread, on the other hand would it be wise to help facilitate this process. I later found out they decided not to publish the tool for this very reason. [1] See first two entries on http://infodisiac.com/Wikimedia/Visualizations/ Erik Zachte From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Fae Sent: Wednesday, March 23, 2011 10:45 To: afo...@gatech.edu; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] edit counts for specific users Hi, Please take care to stay within the policy stated at http://meta.wikimedia.org/wiki/Privacy_policy - if you are researching in general there is no issue but if you are analysing/data mining a specific editor's contributions it should be for a recognized bureaucratic purpose. Cheers, Fæ -- http://enwp.org/user_talk:fae ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Our mailing list statistics
Maybe Twitter is the reason there are less posts recently. Twitter and mailing lists may be competing channels. Erik > -Original Message- > From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki- > research-l-boun...@lists.wikimedia.org] On Behalf Of Piotr Konieczny > Sent: Monday, July 27, 2009 01:44 > To: Research into Wikimedia content and communities > Subject: Re: [Wiki-research-l] Our mailing list statistics > > Erik Zachte wrote: > > Maybe Twitter ? > > Maybe Twitter what? :) > > > -- > Piotr Konieczny > > "The problem about Wikipedia is, that it just works in reality, not in > theory." > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Our mailing list statistics
Maybe Twitter ? Erik Zachte > -Original Message- > From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki- > research-l-boun...@lists.wikimedia.org] On Behalf Of Piotr Konieczny > Sent: Monday, July 27, 2009 00:04 > To: wiki-research-l@lists.wikimedia.org > Subject: [Wiki-research-l] [wiki-research-l] Our mailing list > statistics > > Researching ourselves: > http://www.infodisiac.com/Wikipedia/ScanMail/Wiki-research-l.html > http://www.infodisiac.com/Wikipedia/ScanMail/Index.html > > I do wonder why the activity of our list has dropped so much this year? > > -- > Piotr Konieczny > > "The problem about Wikipedia is, that it just works in reality, not in > theory." > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] "Regular contributor"
Felipe, about you second argument, that not all bots are registered as such that (or not anymore, it may change): yes that is a problem. I can only hope that really active bots are caught and registered on large wikis. Many bots that are active on many wikis are not registered as such on smaller wikis. Therefore I treat any user name that is registered as bot on 10+ wikis as bot on all wikis. It is of course again an correction which is not 100% accurate, but close I might hope. Single User Logon can help in this respect some day. In theory we could spot some bots by their behavior, say a user that edits 24 hours per day, of manages 5 updates per second for a long time, or added thousands of articles in a short period. But Im not sure it would be worth the effort, and it would low priority in any case. Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] "Regular contributor" Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm> ja <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm> fr <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm> it <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm> pl <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm> es <http://stats.wikimedia.org/EN/TablesWikipediaES.htm> nl <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm> pt <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm> ru <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm> zh <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm> sv <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm> fi <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm> 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega <[EMAIL PROTECTED]> Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte <[EMAIL PROTECTED]> escribió: > De: Erik Zachte <[EMAIL PROTECTED]> > Asunto: [Wiki-research-l] "Regular contributor" > Para: wiki-research-l@lists.wikimedia.org > Fecha: miércoles, 22 octubre, 2008 9:55 > > Statistics, with "Wikipedians", > "active" and "very active users"; > > > like often, Zachte's Statistics are great, but > easily misleading. > > > > Also keep in mind that most figures in wikistats still > include bot edits. > > IMO it becomes more and more urgent to present separate > counts for humans > and bots. > > > > For instance in eo: 54% of total edits for all time were > bot edits, but most > > of these will be from recent years, so the percentage will > be even higher > > for recent years. > > > > http://stats.wikimedia.org/EN/BotActivityMatrix.htm > > > > Erik Zachte > > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] "Regular contributor"
Hi Felipe, I cant follow your reasoning how bots are insignificant. Just as Ziko pointed out, the matrix of bot contributions (and our general experience) tells otherwise. On larger wikipedias bots account for 5-30% of edits on smaller wikis anything up to 50-70% or even more in rare cases. Think of the bots that add interwiki links as primary example of activities that account for massive amount of edits. These may be insignificant on popular articles with 1000s of edits, but most articles have very few edits, the long tail one might call it and there it adds up. Cheers, Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] "Regular contributor" Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm> ja <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm> fr <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm> it <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm> pl <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm> es <http://stats.wikimedia.org/EN/TablesWikipediaES.htm> nl <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm> pt <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm> ru <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm> zh <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm> sv <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm> fi <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm> 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega <[EMAIL PROTECTED]> Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte <[EMAIL PROTECTED]> escribió: > De: Erik Zachte <[EMAIL PROTECTED]> > Asunto: [Wiki-research-l] "Regular contributor" > Para: wiki-research-l@lists.wikimedia.org > Fecha: miércoles, 22 octubre, 2008 9:55 > > Statistics, with "Wikipedians", > "active" and "very active users"; > > > like often, Zachte's Statistics are great, but > easily misleading. > > > > Also keep in mind that most figures in wikistats still > include bot edits. > > IMO it becomes more and more urgent to present separate > counts for humans > and bots. > > > > For instance in eo: 54% of total edits for all time were > bot edits, but most > > of these will be from recent years, so the percentage will > be even higher > > for recent years. > > > > http://stats.wikimedia.org/EN/BotActivityMatrix.htm > > > > Erik Zachte > > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] "Regular contributor"
Finn, thanks for your attentiveness. Figure 'Sigma total edits' (top left cell) was copied from an earlier calculation, unlike the other totals, which were calculated while building this table. But unlike this table the other table did not calculate monthly totals for months where a major language (in casu English) was not yet processed. See http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm and you get my point. So to be precise: 'Sigma total edits' is actually 'Sigma total edits for all languages for which counts are available'. Fixed report is online. Someday we will have figures for the English Wikipedia, fingers crossed :) Cheers, Erik > -Original Message- > From: [EMAIL PROTECTED] [mailto:wiki- > [EMAIL PROTECTED] On Behalf Of Finn Aarup Nielsen > Sent: Thursday, October 23, 2008 13:12 > To: Research into Wikimedia content and communities > Subject: Re: [Wiki-research-l] "Regular contributor" > > > > Dear Erik, > > > On Wed, 22 Oct 2008, Erik Zachte wrote: > > > [...] > > > > For instance in eo: 54% of total edits for all time were bot edits, > but most > > of these will be from recent years, so the percentage will be even > higher > > for recent years. > > > > http://stats.wikimedia.org/EN/BotActivityMatrix.htm > > Interesting! > > I wonder why there is a discrepancy between the summary for the total > number. "Sigma total edits" are 119M but "Sigma manual edits are > higher: > 193M. As far as I skimmed the figures are ok for the individual > languages. > > > best regards > Finn > > ___ > > Finn Aarup Nielsen, DTU Informatics, Denmark > Lundbeck Foundation Center for Integrated Molecular Brain Imaging > http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ > ___ > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] "Regular contributor"
> Statistics, with "Wikipedians", "active" and "very active users"; > like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l