[Wiki-research-l] New viz.: Wikipedias, participation per language

2018-09-10 Thread Erik Zachte
Hi all,

I just published a new visualization: Wikipedias, compared by participation
per language (= active editors per million speakers)

There are several pages,

one for a global overview
https://stats.wikimedia.org/wikimedia/participation/d3_participation_global.html

one with breakdown by continent
https://stats.wikimedia.org/wikimedia/participation/d3_participation_continent.html

You can also zoom in on one continent, by clicking on it

Any feedback is welcome.

Erik Zachte
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikimedia Commons data structure - public?

2018-07-19 Thread Erik Zachte
Hi Trilce,

There is new set of dumps for every Wikimedia wiki at least once a month.
Among those files are several database dumps in xml format. One with the
most recent version of every article, one with meta data but no article
texts ('stub dumps'). One with full texts for every revision of every
article. Here is the latest set for Commons:
https://dumps.wikimedia.org/commonswiki/20180701/

I hope this helps,
Cheers,
Erik

On Tue, Jul 17, 2018 at 1:52 PM Trilce Navarrete 
wrote:

> Dear all,
>
> I am wondering if the Wikimedia Commons data structure (ideally in XML) as
> well as the documentation thereof and sample data is something that one
> could find online.
>
> There is a team at ICS FORTH who have developed a mapping technology
> called X3ML which allows declarative mappings between two data structures.
> The idea would be to map the Wikimedia Commons data structure to the CIDOC
> CRM, meant for heritage content users.
>
> Where could I try to find the Wikimeida Commons data structure? or who may
> I ask further on this matter?
>
> thank you much in advance for any tips !
> best
> Trilce
>
> --
> :..::...::..::...::..:
> Trilce Navarrete
>
> m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete
> w: trilcenavarrete.com
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] New files for geo coded Wikimedia stats

2018-07-11 Thread Erik Zachte
 Today I released two new json files [2][4].
Both complement visualization 'Wikipedia Views Visualized' [1] (aka
WiViVi), but both can be useful in other contexts as well.
1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted from
harvesting World Bank API files.
It contains yearly figures for four metrics: (more could be added rather
easily):
- population counts,
- percentage internet users,
- percentage mobile subscriptions,
- GDP per capita.
The following static demographics charts on meta are also based on these
metrics: [3]
2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather
complex (*) csv files which feed WiViVi. This brings together demographics
data and pageviews (by country, by region, and by language), and also adds
additional meta info. This json file is meant for external use, as it's
much easier to parse than the 3 csv files WiViVi uses itself [5].
(*) complex , as the csv files use a hierarchy based on nested delimiters
--
Details:
World Bank files have different formats (some csv, some json) and use a
variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3).
Script 1) first does normalization, then data are aggregated, filtered,
indexed.
Json file 1) replaces two csv files which up to now were filled from
Wikipedia pages [6][7].
Also, although Wikipedia lists nowadays also use World Bank data, this is
not consistently done, see [8][9].
[1] Viz:
https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html
[2] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/world-bank-demographics.json
Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank
[3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics
[4] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json
Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/traffic
[5] Syntax:
https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html
[6] Article:
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
[7] Article:
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users
[8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank
population counts'
[9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank
internet percentages'
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] new viz. WiViVi = Wikipedia Views Visualized

2017-08-02 Thread Erik Zachte
Dear all,

 

A new visualization has just been published: WiViVi = Wikipedia Views
Visualized 

 

https://stats.wikimedia.org/wikimedia/animations/pageviews/wivivi.html

documented at 

https://meta.wikimedia.org/wiki/WiViVi

 

Please let me know if you have any feedback  or questions.

 

Thanks,

 

Erik Zachte

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wiki Loves Monuments 2016 stats

2017-01-10 Thread Erik Zachte
New stats are available for Wiki Loves Monuments 2016 contest

http://infodisiac.com/blog/2017/01/wiki-loves-monuments-2016/

 

Charts also on 
https://commons.wikimedia.org/wiki/Category:Wiki_Loves_Monuments_2016_stats

 

Erik Zachte

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia video stats ?

2016-11-04 Thread Erik Zachte
There is work being done towards front-end for media count files. 
Step one completed: at least the counts are in a database now, albeit only some 
columns.

https://phabricator.wikimedia.org/T116363

Erik

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Daniel Mietchen
Sent: Friday, November 04, 2016 3:40
To: Research into Wikimedia content and communities
Cc: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Wiki-research-l] Wikipedia video stats ?

once I had the link open, I actually had a look at it in "Show
details" mode and was surprised to find not a single .ogg or .ogv file
listed amongst the top 1k files. Seems like they're counted as image
files by the MIME type filter: when I selected the "image" box, a good
number of them popped up in the list.

On Fri, Nov 4, 2016 at 3:07 AM, Daniel Mietchen
 wrote:
> If you use
> https://commons.wikimedia.org/wiki/Category:Videos
> with GLAMorous (after unselecting the image and audio MIME types), it
> gives some basic usage data across wikis, though no view stats:
> https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Videos&use_globalusage=1&ns0=1&depth=15&projects[wikipedia]=1&projects[wikimedia]=1&projects[wikisource]=1&projects[wikibooks]=1&projects[wikiquote]=1&projects[wiktionary]=1&projects[wikinews]=1&projects[wikivoyage]=1&projects[wikispecies]=1&projects[mediawiki]=1&projects[wikidata]=1&projects[wikiversity]=1
>
> On Thu, Nov 3, 2016 at 9:11 PM, Trilce Navarrete
>  wrote:
>> Dear Tilman, thanks much for this ! very helpful. Though it is not a number
>> I can use right away, it is a very nice invitation to further explore the
>> potential. Will be sending the paper back to the list when ready :)
>>
>> again, thanks much !
>> best
>> T
>>
>> On Thu, Nov 3, 2016 at 8:52 PM, Tilman Bayer  wrote:
>>>
>>> Hi Trilce,
>>>
>>> some data exists about video views, although it's AFAIK not available
>>> in form of a nice online tool. See
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts
>>>
>>> On Mon, Oct 31, 2016 at 5:34 AM, Trilce Navarrete
>>>  wrote:
>>> > Dear all,
>>> >
>>> > I'm doing some research on the use of image and video in Wikipedia and
>>> > would
>>> > like to know if there is any way to track # of video views in Wikipedia
>>> > articles ?
>>> >
>>> > Image view per page I use the GLAM tools, but for video, I'm not sure if
>>> > there is a tool or general Wikipedia stat on # of videos currently used
>>> > in
>>> > all languages, # of Wikipedia articles containing video and # of views
>>> > to
>>> > this pages.
>>> >
>>> > I understand use of video online is exploiting, and wondered if the wiki
>>> > had
>>> > stats on this as well.
>>> >
>>> > your feedback will be most appreciated !
>>> > thanks much in advance
>>> > Trilce
>>> >
>>> > --
>>> > :..::...::..::...::..:
>>> > Trilce Navarrete
>>> >
>>> > m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete
>>> > w: trilcenavarrete.com
>>> >
>>> > ___
>>> > Wiki-research-l mailing list
>>> > Wiki-research-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>> >
>>>
>>>
>>>
>>> --
>>> Tilman Bayer
>>> Senior Analyst
>>> Wikimedia Foundation
>>> IRC (Freenode): HaeB
>>>
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>>
>> --
>> :..::...::..::...::..:
>> Trilce Navarrete
>>
>> m: +31 (0)6 244 84998 | s: trilcen | t: @trilcenavarrete
>> w: trilcenavarrete.com
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Multi year page views statistics

2016-07-11 Thread Erik Zachte
New phab request: https://phabricator.wikimedia.org/T139934

Erik

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Federico Leva (Nemo)
Sent: Monday, July 11, 2016 15:29
To: avnerkan...@gmail.com; Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Multi year page views statistics

Avner Kantor, 11/07/2016 13:43:
> Can it be done by https://tools.wmflabs.org/pageviews

No.
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Updates_and_backfilling

> or any other tool?

Sure. Preferably by using
https://dumps.wikimedia.org/other/pagecounts-ez/ , but most people end up 
getting JSON from http://stats.grok.se/

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on education

2016-04-21 Thread Erik Zachte
Here are all 96610 subcategories of Education, with 2.6 million articles.

The problem is sometimes one unexpected subcategory can draw in lots of 
unexpected content, and the most viewed article can thus be totally off-topic.

I could do some iterations and prune the tree into something more manageable, 
by blacklisting weird subbranches.

 

https://stats.wikimedia.org/wikimedia/pageviews/categorized/wp-en/2016-02/categories_wp-en_cat_Education_2016-02.html

 

Erik Zachte

 

 

From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Leila Zia
Sent: Thursday, April 21, 2016 23:13
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on 
education

 

John, I played with Wikipedia Tools for Google and I'm sure it will do what 
you're looking for. Check out this 
<https://docs.google.com/spreadsheets/d/1HeFluqXXcSXw14pk_hceKbuxykNaTjOJMLrNxs81Ifk/edit#gid=0>
  Google spreadsheet. You just have to repeat a slightly modified formula in 
columns B and C to get what you have in column D for all subcategories of 
Education listed in A. You can automate that part, too.

 

L

 

 

On Thu, Apr 21, 2016 at 12:39 PM, john cummings  
wrote:

Hi Leila

 

Thanks very much, what I need to be able to do is get all the articles within 
the category and subcategories of Category:Education and then get page views 
for all of them, its a lot of pages.. My friend Ed Saperia created a 
spreadsheet to do this but unfortunately the query API limits to a few 100 
articles so its not possible to run the query through that. 

 

Any other suggestions would be very much appreciated.

 

Thanks

 

John

 

On 21 April 2016 at 18:54, Leila Zia  wrote:

 

Hi John,

 

Two comments:

* Have you tried Wikipedia Tools for Google 
<https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en>
 ? It's a very neat add-on for Chrome, and in your case, the two functions 
WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.

 

* If you are looking for having a list of articles related to Education that 
are available in English and are missing in another language, you can use the 
article recommendation API. For example: http://recommend.wmflabs.org/api?s=en 
<http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Education> 
&t=fr&n=10&article=Education gives you the top 10 recommendations for articles 
related to Education that are available in English but missing in French. Note 
that "related" is not the same as articles that are in category "Education" 
though I hope we can accommodate categories in the future. The documentation 
for the API is in here 
<https://github.com/ewulczyn/translation-recs-app/tree/master/api> .

 

Hope this helps.

 

Best,

Leila




Leila Zia

Research Scientist

Wikimedia Foundation

 

On Thu, Apr 21, 2016 at 5:04 AM, john cummings  wrote:

Hi all

 

I'm doing some work with colleagues from the education sector at UNESCO to look 
at improving some of the most viewed education articles on English language 
Wikipedia. 

 

I'm trying to use TreeViews to get information on what are the most viewed 
articles in Category:Education, unfortunately such large categories just crash 
my browser, it means I will have to split the query up into at least 50-100 
smaller queries. 

 

Does anyone know of a less manual way around this? Ideally the output would be 
spreadsheet of the article title and the number of page views of the article 
for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the 
only way but I'd really love to save myself from half a day of data entry. I 
imagine this would also be useful for people working with other organisations 
for other subjects.

 

Thanks

 

John

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population?

2012-05-14 Thread Erik Zachte
What about users who register without any intention to edit? 

 

I expect many people register out of habit, because they expect unspecified
benefits. On most sites there are some. 

And there even are some on our site for read-only users, namely to be able
to tweak the user settings (e.g. how links are displayed).

 

Erik Zachte

 

 

From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of
WereSpielChequers
Sent: Thursday, May 10, 2012 7:58 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Are there any stats on activity of editors
compared to the population?

 

Hi Piotr,

You might make the assumption that the difference between 4 million and 16
million is largely editors who never get out of userspace, my experience is
that such users are relatively rare, or at least won't dominate that 12
million. 

I'm fairly sure that there will be a number of different groups in that 12
million. Steve Walling, Aaron or Maryana may be able to help analyse or at
least explain them.

Significant groups in the 12 million will definitely include:

1 People who registered an account and tried but never successfully saved an
edit because when they looked they saw a wall of code and they don't do
html. The WMF is investing a lot of money in WYSIWYG editing software in the
hope that this will enable goodfaith but not very technical people to edit
Wikipedia. 

2 Vandals since 2007. We have edit filters that are trying to dissuade
vandals from saving their first edit because it triggers  one of our tests
for probably being vandalism. These filters only came in during the last few
years and have been improved over time - so they are deterring a significant
proportion of recent badfaith editors from ever saving an edit.

3 Visitors from other wikis. One of the features of Single User Login is
that if you are logged in and you click on a link that takes you to another
wikimedia wiki, your account becomes active at that wiki even if you never
go near the edit button. My account is active on 92 wikis and I've edited in
rather less than half of them. I won't go into all the reasons why one might
visit other wikis, but if you see that an article you've written has
equivalents in several other languages I consider it human nature to click
on the links and look at the article. Even if you don't use Google
translate, the choice of image and the size of the paragraphs is often
enough to tell you whether someone has translated your work or started
afresh. 

4 Editors whose articles have been deleted. About a quarter of new editors
start by creating a new article rather than by editing existing articles. A
large majority of such articles get deleted and their authors depart. If the
4 million is only measured on surviving edits to article space then there
will be many hundreds of thousands whose only article space edits have been
deleted.

5 Zombie accounts. We now have programs that prevent people opening accounts
that are overly similar to the names of existing editors, but before these
filters came in many editors would protect themselves from such
impersonation by creating such  "zombie accounts" themselves and marking
their userpage with a link to their main account.

6 Edit conflicts. Breaking news stories attract editors like moths to
flames, our article on Sarah Palin peaked at 25 edits per minute at one
point during the day she became John McCain's running mate (I don't think
anyone logs the number of edit conflicts). If you are a newbie trying to
edit a trending article by using that edit button on the top of the page
then you are guaranteed to get frustrated and leave. The regulars have
learned that busy pages are best edited one section at a time, and on a very
busy page there simply isn't time to edit the whole page before a section
edit is saved. Of course that could be easily resolved by disabling whole
page editing on busy pages, but I'm not expecting that anytime soon.

Another issue is that I believe that the 4 million are people who have one
undeleted edit to mainspace on the English Wikipedia since December 2004. If
so the 16 million may include those who haven't edited since December 2004.

I'm probably missing a few other variables, I'm afraid this is a complex
area, but I hope this gives you an idea of the problem.

WSC




On 10 May 2012 16:35, Piotr Konieczny  wrote:

Thanks for the link. The figure 4,058,477 you cite (from
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution), as
you note, comes with the warning that "Only article edits are counted, not
edits on discussion pages, etc". I assume this is why the magic word
NUMBEROFUSERS at en Wikipedia returns 16,763,691 (numerous low activity
editors apparently make their few edits outside article mainspace).

The breakdown I could live with, for a while, but t

Re: [Wiki-research-l] wikitrends

2012-02-25 Thread Erik Zachte
> Yes, this is clearly something that needs to be done :-) 

Totally cool!

> I think jumbling them all together would make things a bit less
interesting.

I agree.

> Would it be best to have a drop down that lets you select the project? 

A drop down for 280 Wikipedia's would be somewhat hard to navigate.

What about a front page with all major projects
(Commons,Wikibooks,Wikinews,Wikipedia,Wikiquote,Wikisource,Wikiversity,Wikti
onary,Other projects)
sorted by name or total requests in the past hour, each linking to an
overview page for one project.

On that second page (or below that first list) you could list all languages
for that project, sortable by name or by total requests in the past hour,
each linking to a page like you have now.

Instead of a long sortable table you could have a swappable index, like e.g.
http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm

Erik 






-Original Message-
From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed Summers
Sent: Tuesday, February 21, 2012 6:01 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] wikitrends

Yes, this is clearly something that needs to be done :-) Would it be best to
have a drop down that lets you select the project? I think jumbling them all
together would make things a bit less interesting.

//Ed

On Tue, Feb 21, 2012 at 10:51 AM, Erik Zachte 
wrote:
> Awesome!
>
> Followed by the obligatory "Could you please also " ;-)
>
> In this case the dots stand for "add pages other Wikipedia wikis, 
> ideally also for other sister projects?"
> All data are in the same file you use already.
>
> Best, Erik Zachte
>
>
>
>
>
> -Original Message-
> From: wiki-research-l-boun...@lists.wikimedia.org
> [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed 
> Summers
> Sent: Tuesday, February 21, 2012 9:36 AM
> To: Research into Wikimedia content and communities
> Subject: [Wiki-research-l] wikitrends
>
> I imagine something like this has already been done before, but I 
> thought I would mention it as a curiosity:
>
>    Wikitrends
>    http://inkdroid.org/wikitrends/
>
> Wikitrends is a display of the top 25 view articles on English 
> Wikipedia in the latest hour. It relies on stats that Wikimedia make 
> available [1]. If you hover over the article you should get the 
> article summary (courtesy of the MediaWiki API), and there are canned 
> search links of realtime Google and Twitter and Facebook search if you 
> want to look at what people might be saying about the topic.
>
> I put the code up on Github [2] and wrote a brief blog entry about the 
> process of putting the app together. The punchline that I was trying 
> to work up to is that it is truly wonderful that Wikimedia makes an 
> effort to make its data assets available on the Web, both via an API and
as bulk downloads.
> It is a great role model for other organizations and institutions.
>
> Thanks!
> //Ed
>
> [1] http://dumps.wikimedia.org/other/pagecounts-raw/
> [2] http://inkdroid.org/edsu/wikitrends/
> [3] http://inkdroid.org/journal/2012/02/21/nodb/
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] wikitrends

2012-02-21 Thread Erik Zachte
Awesome!

Followed by the obligatory "Could you please also " ;-)

In this case the dots stand for "add pages other Wikipedia wikis, ideally
also for other sister projects?" 
All data are in the same file you use already.

Best, Erik Zachte





-Original Message-
From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Ed Summers
Sent: Tuesday, February 21, 2012 9:36 AM
To: Research into Wikimedia content and communities
Subject: [Wiki-research-l] wikitrends

I imagine something like this has already been done before, but I thought I
would mention it as a curiosity:

Wikitrends
http://inkdroid.org/wikitrends/

Wikitrends is a display of the top 25 view articles on English Wikipedia in
the latest hour. It relies on stats that Wikimedia make available [1]. If
you hover over the article you should get the article summary (courtesy of
the MediaWiki API), and there are canned search links of realtime Google and
Twitter and Facebook search if you want to look at what people might be
saying about the topic.

I put the code up on Github [2] and wrote a brief blog entry about the
process of putting the app together. The punchline that I was trying to work
up to is that it is truly wonderful that Wikimedia makes an effort to make
its data assets available on the Web, both via an API and as bulk downloads.
It is a great role model for other organizations and institutions.

Thanks!
//Ed

[1] http://dumps.wikimedia.org/other/pagecounts-raw/
[2] http://inkdroid.org/edsu/wikitrends/
[3] http://inkdroid.org/journal/2012/02/21/nodb/

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] edit counts for specific users

2011-03-23 Thread Erik Zachte
> If you're producing analyses that call out individual editors, then yes,
it would be wise to make such tools opt-in.

 

That makes all the difference. I’d also love to see such viz. for my own
edits and probably wouldn’t mind sharing it.

 

And I’m not arguing against mining these data for research. I trust that
research will focus on generalized findings, 

and in an article will provide an example for which consent had been given.

 

My point is rather that if we provide generic tools as a service to the
research community the issue of opt-in will sooner or later become mute.

Someone will take the tool, add the category cloud, and start wikigossip.com
(just checked: domain is reserved)

I know this is a general trend anyway, lots of tools already exist that help
you analyze someone’s presence on the web.

 

> But for every Wikipedian who would rather not, there are ten more (like
me) that really want more insight into the rich data set of our editing
histories.

 

On an aggregate level or secure access level, yes. Not to feed our
interpersonal curiosity. 

I’m sure no-one here has that in mind and of course I wasn’t implicating
such. 

Just raising awareness of what it could lead to.

 

Erik Zachte

 

 

From: Steven Walling [mailto:steven.wall...@gmail.com] 
Sent: Wednesday, March 23, 2011 18:30
To: Research into Wikimedia content and communities
Cc: Erik Zachte; afo...@gatech.edu
Subject: Re: [Wiki-research-l] edit counts for specific users

 

On Wed, Mar 23, 2011 at 5:46 AM, Erik Zachte 
wrote:

In Wikimania Boston, 2006, visualization experts [1]  Fernanda Viégas en
Martin Wattenberg presented a tool which could produce a tag cloud from a
person's edit history. Tag clouds were a novelty and very suitable for the
matter at hand. You could see at a glance that editor Johanna Doe was mainly
engaged in articles about say classic music, and Chinese and Iran politics,
which is OK of course, but maybe better left to the person to disclose at
her own discretion. We discussed implications of the visualization: on one
hand this was all data from the public dumps, and anyone could make such a
script once the idea spread, on the other hand would it be wise to help
facilitate this process. I later found out they decided not to publish the
tool for this very reason.

[1] See first two entries on http://infodisiac.com/Wikimedia/Visualizations/

Erik Zachte


That is really sad.

As a Wikipedian, I would hate to see any researcher shy away from publishing
interesting and insightful visualizations of public data. 

If you're producing analyses that call out individual editors, then yes, it
would be wise to make such tools opt-in. But for every Wikipedian who would
rather not, there are ten more (like me) that really want more insight into
the rich data set of our editing histories.

Steven

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] edit counts for specific users

2011-03-23 Thread Erik Zachte
In Wikimania Boston, 2006, visualization experts [1]  Fernanda Viégas en
Martin Wattenberg presented a tool which could produce a tag cloud from a
person's edit history. Tag clouds were a novelty and very suitable for the
matter at hand. You could see at a glance that editor Johanna Doe was mainly
engaged in articles about say classic music, and Chinese and Iran politics,
which is OK of course, but maybe better left to the person to disclose at
her own discretion. We discussed implications of the visualization: on one
hand this was all data from the public dumps, and anyone could make such a
script once the idea spread, on the other hand would it be wise to help
facilitate this process. I later found out they decided not to publish the
tool for this very reason. 

[1] See first two entries on http://infodisiac.com/Wikimedia/Visualizations/

Erik Zachte


From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Fae
Sent: Wednesday, March 23, 2011 10:45
To: afo...@gatech.edu; Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] edit counts for specific users

Hi,

Please take care to stay within the policy stated at
http://meta.wikimedia.org/wiki/Privacy_policy - if you are researching in
general there is no issue but if you are analysing/data mining a specific
editor's contributions it should be for a recognized bureaucratic purpose.

Cheers,
Fæ
--
http://enwp.org/user_talk:fae



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Our mailing list statistics

2009-07-26 Thread Erik Zachte
Maybe Twitter is the reason there are less posts recently.
Twitter and mailing lists may be competing channels.
Erik

> -Original Message-
> From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-
> research-l-boun...@lists.wikimedia.org] On Behalf Of Piotr Konieczny
> Sent: Monday, July 27, 2009 01:44
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Our mailing list statistics
> 
> Erik Zachte wrote:
> > Maybe Twitter ?
> 
> Maybe Twitter what? :)
> 
> 
> --
> Piotr Konieczny
> 
> "The problem about Wikipedia is, that it just works in reality, not in
> theory."
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Our mailing list statistics

2009-07-26 Thread Erik Zachte
Maybe Twitter ?

Erik Zachte

> -Original Message-
> From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-
> research-l-boun...@lists.wikimedia.org] On Behalf Of Piotr Konieczny
> Sent: Monday, July 27, 2009 00:04
> To: wiki-research-l@lists.wikimedia.org
> Subject: [Wiki-research-l] [wiki-research-l] Our mailing list
> statistics
> 
> Researching ourselves:
> http://www.infodisiac.com/Wikipedia/ScanMail/Wiki-research-l.html
> http://www.infodisiac.com/Wikipedia/ScanMail/Index.html
> 
> I do wonder why the activity of our list has dropped so much this year?
> 
> --
> Piotr Konieczny
> 
> "The problem about Wikipedia is, that it just works in reality, not in
> theory."
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] "Regular contributor"

2008-11-13 Thread Erik Zachte
Felipe, about you second argument, that not all bots are registered as such
that (or not anymore, it may change): yes that is a problem.

I can only hope that really active bots are ‘caught’ and registered on large
wikis.

 

Many bots that are active on many wikis are not registered as such on
smaller wikis.

Therefore I treat any user name that is registered as bot on 10+ wikis as
bot on all wikis.

It is of course again an correction which is not 100% accurate, but close I
might hope.

Single User Logon can help in this respect some day.

 

In theory we could spot some bots by their behavior, say a user that edits
24 hours per day, of manages 5 updates per second for a long time, or added
thousands of articles in a short period.

But I’m not sure it would be worth the effort, and it would low priority in
any case.

 

Erik 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ziko van
Dijk
Sent: Thursday, November 13, 2008 23:37
To: [EMAIL PROTECTED]; Research into Wikimedia content and
communities
Subject: Re: [Wiki-research-l] "Regular contributor"

 

Hello Felipe,

Maybe we speak about different things now. At
http://stats.wikimedia.org/EN/BotActivityMatrix.htm


de <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm> 

ja <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm> 

fr <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm> 

it <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm> 

pl <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm> 

es <http://stats.wikimedia.org/EN/TablesWikipediaES.htm> 

nl <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm> 

pt <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm> 

ru <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm> 

zh <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm> 

sv <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm> 

fi <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm> 

 


8%

6%

22%

25%

26%

15%

29%

30%

26%

15%

23%

22%


The bot share of all edits is not that insignificant.

Ziko



2008/11/13 Felipe Ortega <[EMAIL PROTECTED]>

Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our
analyses on the top-ten Wikipedias, we found that bots contributions
introduced very few noise in data (to be precise statistically, it was not
significant at all).

You also have the additional problem that some bots are not identified in
the users_group table.

My "practical impression" is that when you deal with overall figures, then
bots are irrelevant. However, if you want to focus in special metrics like
concentration indexes then their contribution DOES MATTER, since a very
active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[EMAIL PROTECTED]> escribió:

> De: Erik Zachte <[EMAIL PROTECTED]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: wiki-research-l@lists.wikimedia.org
> Fecha: miércoles, 22 octubre, 2008 9:55

> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
>
> Erik Zachte
>
>
>

> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Ziko van Dijk
NL-Silvolde

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] "Regular contributor"

2008-11-13 Thread Erik Zachte
Hi Felipe, 

 

I can’t follow your reasoning how bots are insignificant.

Just as  Ziko pointed out, the matrix of bot contributions (and our general
experience) tells otherwise.

On larger wikipedias bots account for 5-30% of edits on smaller wikis
anything up to 50-70% or even more in rare cases.

 

Think of the bots that add interwiki links as primary example of activities
that account for massive amount of edits.

These may be insignificant on popular articles with 1000’s of edits, but
most articles have very few edits, ‘the long tail’ one might call it and
there it adds up.

 

Cheers, Erik 

 

 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ziko van
Dijk
Sent: Thursday, November 13, 2008 23:37
To: [EMAIL PROTECTED]; Research into Wikimedia content and
communities
Subject: Re: [Wiki-research-l] "Regular contributor"

 

Hello Felipe,

Maybe we speak about different things now. At
http://stats.wikimedia.org/EN/BotActivityMatrix.htm


de <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm> 

ja <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm> 

fr <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm> 

it <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm> 

pl <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm> 

es <http://stats.wikimedia.org/EN/TablesWikipediaES.htm> 

nl <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm> 

pt <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm> 

ru <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm> 

zh <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm> 

sv <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm> 

fi <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm> 

 


8%

6%

22%

25%

26%

15%

29%

30%

26%

15%

23%

22%


The bot share of all edits is not that insignificant.

Ziko



2008/11/13 Felipe Ortega <[EMAIL PROTECTED]>

Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our
analyses on the top-ten Wikipedias, we found that bots contributions
introduced very few noise in data (to be precise statistically, it was not
significant at all).

You also have the additional problem that some bots are not identified in
the users_group table.

My "practical impression" is that when you deal with overall figures, then
bots are irrelevant. However, if you want to focus in special metrics like
concentration indexes then their contribution DOES MATTER, since a very
active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[EMAIL PROTECTED]> escribió:

> De: Erik Zachte <[EMAIL PROTECTED]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: wiki-research-l@lists.wikimedia.org
> Fecha: miércoles, 22 octubre, 2008 9:55

> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
>
> Erik Zachte
>
>
>

> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Ziko van Dijk
NL-Silvolde

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] "Regular contributor"

2008-10-23 Thread Erik Zachte
Finn, thanks for your attentiveness.

Figure 'Sigma total edits' (top left cell) was copied from an earlier
calculation, unlike the other totals, which were calculated while building
this table. But unlike this table the other table did not calculate monthly
totals for months where a major language (in casu English) was not yet
processed.
See http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm and you get my
point.

So to be precise: 'Sigma total edits' is actually 'Sigma total edits for all
languages for which counts are available'.

Fixed report is online. Someday we will have figures for the English
Wikipedia, fingers crossed :)

Cheers, Erik

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:wiki-
> [EMAIL PROTECTED] On Behalf Of Finn Aarup Nielsen
> Sent: Thursday, October 23, 2008 13:12
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] "Regular contributor"
> 
> 
> 
> Dear Erik,
> 
> 
> On Wed, 22 Oct 2008, Erik Zachte wrote:
> 
> > [...]
> >
> > For instance in eo: 54% of total edits for all time were bot edits,
> but most
> > of these will be from recent years, so the percentage will be even
> higher
> > for recent years.
> >
> > http://stats.wikimedia.org/EN/BotActivityMatrix.htm
> 
> Interesting!
> 
> I wonder why there is a discrepancy between the summary for the total
> number. "Sigma total edits" are 119M but "Sigma manual edits are
> higher:
> 193M. As far as I skimmed the figures are ok for the individual
> languages.
> 
> 
> best regards
> Finn
> 
> ___
> 
>   Finn Aarup Nielsen, DTU Informatics, Denmark
>   Lundbeck Foundation Center for Integrated Molecular Brain Imaging
> http://www.imm.dtu.dk/~fn/  http://nru.dk/staff/fnielsen/
> ___
> 
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] "Regular contributor"

2008-10-22 Thread Erik Zachte
> Statistics, with "Wikipedians", "active" and "very active users"; 

> like often, Zachte's Statistics are great, but easily misleading.

 

Also keep in mind that most figures in wikistats still include bot edits.

IMO it becomes more and more urgent to present separate counts for humans
and bots.

 

For instance in eo: 54% of total edits for all time were bot edits, but most

of these will be from recent years, so the percentage will be even higher

for recent years.

 

http://stats.wikimedia.org/EN/BotActivityMatrix.htm

 

Erik Zachte

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l