Re: [Wiki-research-l] 2012 top pageview list

2012-12-28 Thread John Vandenberg
Is favicon only in the Chinese Wikipedia top 100?

It seems so, and is odd if the problem is a web browser bug.

John Vandenberg.
sent from Galaxy Note
On Dec 28, 2012 4:07 PM, Johan Gunnarsson johan.gunnars...@gmail.com
wrote:

 On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg jay...@gmail.com wrote:
  Hi Johan,
 
  Thank you for the lovely data at
 
  https://toolserver.org/~johang/2012.html
 
  I posted that link to my facebook (below if you want to join in
  there), and a few language specific facebook groups, and there have
  been some concerns raised about the results, which I'll list below.
 
  These lists are getting some traction in the press so it would be good
  to understand it better.
 
  http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed

 Cool, cool.

 
  Why is [[zh:Favicon]] #2?
 
  The data doesnt appear to support that
 
  http://stats.grok.se/zh/201201/Favicon
  http://stats.grok.se/zh/latest90/Favicon

 My post-processing filtering follows redirects to find the true
 title. In this case the page Favicon.ico redirects to Favicon. This is
 probably due to broken browsers trying to load the icon.

 
  Number 1 in French is a plant native to asia.  The stats for December
 disagree
  https://en.wikipedia.org/wiki/Ilex_crenata
  http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9

 French's Ilex_crenata redirects to Houx_crénelé.

 Ilex_crenata had huge traffic in April:
 http://stats.grok.se/fr/201204/Ilex_crenata

 There are a bunch of spikes like this. I can't really explain it. I
 talked to Domas Mituzas (the maintainer of the original dumps I use)
 yesterday and he suggested it might be bots going crazy for whatever
 reason. I'd love to filter all these false positives, but haven't been
 able to come up with an easy way to do it.

 Might be possible with access to logs with the user-agent string, but
 that would probably inflate the dataset size even more. It's already
 past the terabyte. However that could probably be solved by sampling
 (for example) 1/100 of the entries.

 Comments and ideas are welcome!

 
  Number 1 in German is Cul de sac. This is odd, but matches the stats
  http://stats.grok.se/de/201207/Sackgasse

 RIght. This one is funny. It has huge traffic on weekdays only.
 Deserted on weekends.

 
  Number 1 in Dutch is a Chinese mountain.  The stats for December disagree
  http://stats.grok.se/nl/201212/Hua_Shan

 July/August agree: http://stats.grok.se/nl/201208/Hua_Shan

 
  Number 4 in Hebrew is zipper.  The stats for December disagree
  http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F

 April agrees:
 http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F

 
  Number 2 in Spanish is '@'.  This is odd, but matches the stats
  http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29
 
  --
  John Vandenberg
  https://www.facebook.com/johnmark.vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 2012 top pageview list

2012-12-28 Thread John Vandenberg
There is a steady stream of blogs and 'news' about these lists

https://encrypted.google.com/search?client=ubuntuchannel=fsq=%22Sean+hoyland%22ie=utf-8oe=utf-8#q=wikipedia+top+2012hl=ensafe=offclient=ubuntutbo=dchannel=fstbm=nwssource=lnttbs=qdr:wsa=Xpsj=1ei=GzjeUOPpAsfnrAeQk4DgCgved=0CB4QpwUoAwbav=on.2,or.r_gc.r_pw.r_cp.r_qf.bvm=bv.1355534169,d.aWMfp=4e60e761ee133369bpcl=40096503biw=1024bih=539

How does a researcher go about obtaining access logs with useragents
in order to answer some of these questions?

-- 
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] 2012 top pageview list

2012-12-27 Thread John Vandenberg
Hi Johan,

Thank you for the lovely data at

https://toolserver.org/~johang/2012.html

I posted that link to my facebook (below if you want to join in
there), and a few language specific facebook groups, and there have
been some concerns raised about the results, which I'll list below.

These lists are getting some traction in the press so it would be good
to understand it better.

http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed

Why is [[zh:Favicon]] #2?

The data doesnt appear to support that

http://stats.grok.se/zh/201201/Favicon
http://stats.grok.se/zh/latest90/Favicon

Number 1 in French is a plant native to asia.  The stats for December disagree
https://en.wikipedia.org/wiki/Ilex_crenata
http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9

Number 1 in German is Cul de sac. This is odd, but matches the stats
http://stats.grok.se/de/201207/Sackgasse

Number 1 in Dutch is a Chinese mountain.  The stats for December disagree
http://stats.grok.se/nl/201212/Hua_Shan

Number 4 in Hebrew is zipper.  The stats for December disagree
http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F

Number 2 in Spanish is '@'.  This is odd, but matches the stats
http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29

-- 
John Vandenberg
https://www.facebook.com/johnmark.vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Minor stats on Wikipedia

2012-11-01 Thread John Vandenberg
On Nov 1, 2012 9:28 AM, Piotr Konieczny pio...@post.pl wrote:


 On 10/31/2012 6:34 PM, Federico Leva (Nemo) wrote:

 Piotr Konieczny, 31/10/2012 23:08:

 Would anyone have/know where to find any of the following estimates for


 * of Wikipedians with a userpage


 http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#namespaces gives you
the number of pages in User: namespace.

 Thanks, I thought i new this page, but I guess I didn't new it well
enough.

 Incidentally, here's a chilling number: the average number of new editors
per month in 2011 was 7,700, in 2012 it is forming up to be about 6,500. I
don't like this trend at all; I thought the number of new editors was. ..

Is that new _editors_ or new users?

If new users, does it include SUL creations?  We should expect SUL account
creations to have a peak and then drop sharply once most non-English
editors have visited enwp while logged in.

--
John
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] War of 1812 and all that

2012-10-30 Thread John Vandenberg
It would be good to extend the research of War of 1812 to non-English
Wikipedias.

I've had a quick look and it is surprising how many of the articles 'pretty
good', but none are very good. I think that there is a depth level at which
non-English writers say 'I could easily add more, but the [non-English]
article is good enough; if you want more detail you'll almost certainly
know English language and should go read the English article. My time is
better spent expanding another [non-English] article that isnt yet good
enough.'

John Vandenberg.
sent from Galaxy Note
On Oct 29, 2012 3:28 AM, Steven Walling swall...@wikimedia.org wrote:

 On Sun, Oct 28, 2012 at 6:19 AM, Richard Jensen rjen...@uic.edu wrote:

 Look at it demographically: apart from teenage boys coming of age, the
 population of computer-literate people who are ignorant of Wikipedia is
 very small indeed in 2012.  That was not true in 2005 when lots of editors
 joined up and did a lot of work on important articles.


 You seem to be disregarding the entirety of the developing world and
 non-English speakers in that statement.

 --
 Steven Walling
 https://wikimediafoundation.org/


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia Education] [WikiEN-l] [Wikimedia-l] [Wikimedia Announcements] 2012-13 Annual Plan of the Wikimedia Foundation

2012-07-30 Thread John Vandenberg
On Jul 31, 2012 1:43 AM, LiAnna Davis lda...@wikimedia.org wrote:

 Hi John,

 On Sun, Jul 29, 2012 at 2:39 PM, John Vandenberg jay...@gmail.com wrote:
  Ive asked for more info at
 
 
http://meta.wikimedia.org/wiki/Research_talk:Wikipedia_Education_Program_evaluation#random_sample

 I did my best to answer your question there.

Ive replied with more specific questions.

This research was mentioned because of bold statements in the annual plan,
and Tilman Bayer mentioned this blog post:

https://blog.wikimedia.org/2012/04/19/wikipedia-education-program-stats-fall-2011/

Which says U.S. Education Program users are three times better than other
users.

--
JV
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Access2research petition = bad idea

2012-05-21 Thread John Vandenberg
A good example is the Queensland University of Technology Library
paying the open access journal article publishing fees for their
academics.  Because its good business.  They would rather push their
researchers towards OA journals, thereby building the impact of OA
journals, and meaning they can drop non-OA journals from their
subscriptions.

http://www.mendeley.com/research/support-gold-open-access-publishing-strategies-qut/

A practical experiment. ask your Office of VC-Research how many
journal articles your university produced in 2011.  Times it by
USD5,000.  Compare the result with your libraries journal subscription
fees for 2012.

The UIC library doesnt give exact numbers online, but here they give
aggregate costs of the 126 ARL libraries.

http://library.uic.edu/home/services/publishing-and-scholarly-communication/the-cost-of-journals

If every university did that maths, with the same conclusion, they
would agree that there is an enourmous saving to be had if all
universities use open access.

Governments and funding bodies are doing the maths, and the smart ones
are forcing everyones hand by mandating OA in order to obtain funding.

On Mon, May 21, 2012 at 6:30 PM, James Salsman jsals...@gmail.com wrote:
 Dr. Jensen,

 You ask who will pay for publication of journals under the open access model.

 Closed access journals are supported primarily by university libraries
 which pay subscription fees to publishers.  Very rarely do the
 publishers pay anything to the editors and reviewers who produce the
 journals, but they pocket a continuously increasing profit margin,
 which has been increasing at about 1% per year, and currently stands
 at about 27%, per
 http://www.reedelsevier.com/mediacentre/pressreleases/2012/Pages/reed-elsevier-2011-results-announcement.aspx
 In order to achieve such continually increasing profit margins,
 publishers have been forcing price increases through bundling, which
 is an abuse of their monopolistic market power which lack of
 competition from alternative publishing models has allowed them to
 attain.

 Under the open access model, universities pay to support the
 publication and printing of the journals, but do not pay subscription
 fees.  Because there is no profit margin charged, these costs are less
 to the university than commercial subscription fees, and the resulting
 readership is not limited to a tiny fraction of the population.
 (Because costs to the universities are less, they can keep more of the
 money for university official perks and salaries, tax deductible
 junkets for the faculty, and athletic salaries.  Sadly, universities
 hardly ever pass any savings on to tuition payers.  Every subsidy and
 loan guarantee supporting tuition in the postwar era has been matched
 by tuition increases above the cost of living, sadly, while university
 administrative official salaries have kept pace with CEO salaries
 generally, exacerbating income inequality, and increases in faculty
 salaries, perks, and expenses have also exceeded the inflation rate.)
 As you point out, this situation often results in greater charges to
 graduate students, unless their sponsors and grant investigators are
 kind enough to include the journal production fees in their department
 budget.  How often does that happen?

 Your example of journals charging per-paper open access fees is an
 example of subtle extortion in order to cause professors such as
 yourself and other authors to take the position that you have, opposed
 to open access.  Are there any reasons to the contrary?  Are there any
 reasons that participation in such market manipulation schemes could
 be seen as ethical?

 Sincere regards,
 James Salsman

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] # of citations on Wikipedia?

2012-04-23 Thread John Vandenberg
Phoebe,

Stats about {{cite journal .. }} citations can be found at

http://enwp.org/wp:jcw

I dont know if the parser/bot are 'free'.  The bot approval is

https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/JL-Bot_7

On Sat, Apr 21, 2012 at 3:31 AM, phoebe ayers phoebe.w...@gmail.com wrote:
 Hi all,

 Has there been any research done into: the number of citations (e.g.
 to books, journal articles, online sources, everything together) on
 Wikipedia (any language, or all)? The distribution of citations over
 different kinds or qualities of articles? # of uses of citation
 templates? Anything like this?

 I realize this is hard to count, averages are meaningless in this
 context, and any number will no doubt be imprecise! But anything would
 be helpful. I have vague memories of seeing some citation studies like
 this but don't remember the details.

 Thanks,
 -- phoebe

 --
 * I use this address for lists; send personal messages to phoebe.ayers
 at gmail.com *

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Xmldatadumps-l] [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

2011-11-18 Thread John Vandenberg
On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ar...@wikimedia.org wrote:
 Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg
 έγραψε:
 On Fri, Nov 11, 2011 at 11:18 PM, emijrp emi...@gmail.com wrote:
  Forwarding...
 
  -- Forwarded message --
  From: emijrp emi...@gmail.com
  Date: 2011/11/11
  Subject: Old English Wikipedia image dump from 2005
  To: wikiteam-disc...@googlegroups.com
 
 
  Hi all;
 
  I want to share with you this Archive Team link[1]. It is an old English
  Wikipedia image dump from 2005. One of the last ones, probably, before
  Wikimedia Foundation stopped publishing image dumps. Enjoy.
 
  Regards,
  emijrp
 
  [1] http://www.archive.org/details/wikimedia-image-dump-2005-11

 People interested in image dumps may be also interested in my post
 relating to the GFDL requirements, which I think mean images need to
 be included in the dumps.

 https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_usediff=prevoldid=3002611

 excerpt:

 ..the [GFDL] license requires that someone can download a
 ''complete'' Transparent copy for one year after the last Opaque copy
 is distributed. As a result, I believe the BoT needs to ensure that
 the dumps are available ''and'' that they can be available for one
 year after WMF turns of the lights on the core servers (it allows
 'agents' to provide this service). As Wikipedia contains images, the
 images are required to be included. ..

 discussion continues ..

 https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork


 I would read this as requiring access to the images to remain available,
 not necessarily in dump form.

I dont believe that is the case.  The GFDL, like the GPL, requires
that it is possible to rebuild the product from the distributed
source, minus any seperately distributed dependencies.

It is necessary to provide a simple mechanism for reliably downloading
the used images on each project and incorporating all of the dumps
needed to regenerate a replica of each project.

The 'source' can be broken into chunks, but it would be obviously
contray to the spirit of the license to require that each and every
image needs to be downloaded individually.

_and_ it needs to be possible for any consumer to perform the task of
obtaining the source.  Does the WMF block people who attempt to mirror
the project content one item at a time?  IMO blocking them is very
sane, but if that is the only way to obtain the source then it would
again be breaking the licence.

InstantCommons means that those images dont need to be redistributed
in order for the projects to be compliant with the GFDL.

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] Summary of findings from WMF Summer of Research program now available

2011-09-06 Thread John Vandenberg
Thanks Steven, and the Community Department.

I am instantly drawn to the analysis of redlinks.
Can we please have this data!!
Article writers are on stand by ready to kill red links ;-)

The special page for this is dead.

http://en.wikipedia.org/wiki/Special:WantedPages

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [foundation-l] Personal Image Filter results announced

2011-09-05 Thread John Vandenberg
The image filter survey has been covered in the latest Signpost

http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2011-09-05/News_and_notes

The Wikipedia editor satisfaction index may also be of interest to
folk on this list.

On Tue, Sep 6, 2011 at 10:18 AM, John Vandenberg jay...@gmail.com wrote:
 I'd love to see some expert opinion on the recent survey into Image filter.

 Researchers might be able to get their hands on the raw data to make
 sense of it all.

 http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en

 -- Forwarded message --
 From: John Vandenberg jay...@gmail.com
 Date: Mon, Sep 5, 2011 at 9:21 AM
 Subject: Re: [Wikiquote-l] Personal Image Filter results announced
 To: foundatio...@lists.wikimedia.org


 On Sun, Sep 4, 2011 at 2:33 PM, Philippe Beaudette
 pbeaude...@wikimedia.org wrote:


 Ladies and Gentlemen,

 The committee running the vote on the features for the Personal Image Filter
 have released their interim report and vote count.  You may see the results
 at http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en.
 Please note that the results are not final: although the vote count is, and
 has been finalized, the analysis of comments is ongoing.

 Was this survey approved by the Research Committee?
 If so, can they give us an opinion on the survey instrument used,
 whether the survey population obtained is suitable, etc?

 --
 John Vandenberg


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Foundation-l] Fwd: Wikis around Europe!

2011-06-29 Thread John Vandenberg
On Sun, Jun 12, 2011 at 3:59 AM, emijrp emi...@gmail.com wrote:
 Hi. I forward this e-mail, I hope there are people interested on this map.

 -- Forwarded message --
 From: emijrp emi...@gmail.com
 Date: 2011/6/11
 Subject: Wikis around Europe!
 To: wikiteam-disc...@googlegroups.com


 Hi all;

 A friend of mine has sent me this link about wikis (locapedias) around
 Europe.[1] I'm very surprised about the huge amount of wikis available.

 Time to archive all of them.[2] I have been working on Spanish ones. If you
 want to help archiving one country, please, reply to this message to
 coordinate. If not, I will try to archive entire Europe!

 Regards,
 emijrp

 [1]
 http://maps.google.com/maps/ms?ie=UTF8t=hmsa=0msid=115570622864617231547.00044e461c185a89b6d71ll=49.095452,14.677734spn=39.93254,79.013672z=4
 [2] http://code.google.com/p/wikiteam/

very nice map.
it would be nice to have these all listed on http://wikiindex.org
and wikiindex could add geo information so that this map can be
maintained by wikiindex.
it would also be neat for wikiindex to list the date of the last
wikiteam archive of that wiki, so that we can automatically work out
which wikis need to be archived next.

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Summaries of recent Wikipedia research

2011-06-10 Thread John Vandenberg
Thank you HaeB for this valuable addition to the signpost.

-- 
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles

2011-03-21 Thread John Vandenberg
On Thu, Mar 17, 2011 at 8:41 AM, Chitu Okoli chitu.ok...@concordia.ca wrote:
 ...

 * A-ranked conferences in Information and Computing Sciences from
 http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most
 exhaustive journal ranking exercise I have ever found anywhere.

With regards to John Lamps journal list, it is a copy of the *first*
ERA journal list.

http://en.wikipedia.org/wiki/Excellence_in_Research_for_Australia

There is a second ERA journal list being compiled for 2012.
Submissions closed yesterday, and review of ranking is now underway.

The journal list can be browsed via the website.

https://roci.arc.gov.au/

However there is no publicly download-able dataset available yet.

If anyone wants a copy of the second ERA journal list in xml or csv, I
can provide it offlist.

Public consultation about the ranking is open until April 4.

 Unfortunately, I like you have serious questions about the face validity of
 these rankings; I think they heavily overrate many conferences in my own
 field of information systems; I assume the same is true with other fields
 that I don't know so well. (My primary reservation with conference or
 journal rankings by professors is that I strongly suspect that one of the
 main criteria for their rankings is whether or not they have published in
 that outlet before.) Unfortunately, I don't know of anything that approaches
 this ranking in comprehensiveness.

One important point to note in regards to conferences in that journal
list is that conferences are only ranked for the disciplines of

* 08 Information and computer science
http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/4C3249439D3285D6CA257418000470E3?opendocument

* 09 Engineering
http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/050A7395E86A9719CA257418000477A2?opendocument

* 12 Built environment and design
http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/B20002D4CAD6966DCA257418000498EA?opendocument

IMO the ranked conference list was useless in the 2010 ERA process and results.
I've yet to see any improvement in this area for the 2012 ERA.

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] baseline requirements for researcher permission

2010-10-08 Thread John Vandenberg
Hi,

I've started a discussion about baseline requirements for the
'researcher' permission on English Wikipedia.

http://en.wikipedia.org/wiki/Wikipedia_talk:Research#baseline_requirements_for_researcher_permission

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-19 Thread John Vandenberg
On Tue, Jul 20, 2010 at 8:06 AM, Finn Aarup Nielsen f...@imm.dtu.dk wrote:
..
 It not 'necessarily necessary' to make a new Wikimedia project. There has
 been a suggestion (in the meta or strategy wiki) just to use a namespace in
 Wikipedia. You could then have a page called
 http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning

 I would say that a page called:

 http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning

 would be the way to do it. But that would never pass the deletionists. :-)

French Wikipedia already has a namespace dedicated to pages about references.

http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Index

There is quite a bit of activity in this namespace:

http://fr.wikipedia.org/w/index.php?namespace=104tagfilter=title=Sp%E9cial%3AModifications+r%E9centes

English Wikipedia has a few groups of citation pages with bots that
fill in the details.

http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_doi
http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_pmid

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] research on watchlist behaviors?

2010-07-02 Thread John Vandenberg
On 7/2/10, James Howison ja...@howison.name wrote:
 Hi all,

 I'm working on a study for which I'd like to know more about editors'
 watchlisting practices.  Of course what I'd really like is to know who had
 what page on their watchlist when, but I understand the obvious privacy
 issues there.  I assume those issues explain why that information is not
 (AFAIK) available in dumps etc.

 I have read some great qualitative pieces which discuss watchlisting [e.g.
 1], which are very helpful (please don't hesitate to suggest others), but
 haven't seen quantitative data, which our study calls for.

 Failing exact data, what do we know about the distribution of practices of
 watchlisting?

 Currently my plan is to assume that anyone who has edited an article in the
 past 6 months has it on their watchlist.  Obviously a very corse assumption.

A better assumption is that a page is on user A's watchlist if they
edit the page within 10 mins of another user editing the page.

Also worth considering is the public watchlists which are created
using the related changes feature. e.g. I have a separate watchlist
for pages I create, as this is publicly information anyway:

https://secure.wikimedia.org/wikipedia/en/wiki/Special:RecentChangesLinked/User:John_Vandenberg/New_pages

wrt to the watchlist, it is only possible to know which pages are on a
watchlist as of _now_, so the data would need to be snapshotted
periodically in order to analyse how an individual manages their
watchlist, etc.  I would love to know when I added a page to my
watchlist, but the schema doesn't record this information.

http://www.mediawiki.org/wiki/Manual:Watchlist_table

There are quite a few watchlist related bugs, which may also give you
some useful information about how users want to use their watchlist,
and hints into how they are currently using it. ;-)

https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=watchlist

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Features that correlate with pageviews? (Was: Features that correlate with quality)

2010-06-08 Thread John Vandenberg
On Sat, Jun 5, 2010 at 12:16 AM, Brian J Mingus
brian.min...@colorado.edu wrote:
...

 That is an interesting negative finding as well. Just so this thread doesn't
 go without some positive results, here is a table from one of my technical
 reports on some features that do correlate with quality. If the number is
 greater than zero it correlates with quality, if it is 0 it does not
 correlate, and if it is less than 0 it is negatively correlated with
 quality. The scale of the numbers is meaningless and not interpretable,
 although the relative magnitude is important. These are just the relative
 performance of each feature for each class, as extracted from the weights of
 a random forests classifier.

 http://grey.colorado.edu/mediawiki/sites/mingus/images/1/1e/DeHoustMangalathMingus08_feature_table.png

Any chance you can run a similar analysis to look for correlations
with page-views?

I think Liam was originally looking for justification to improve
article content in order for the article to attain higher page-views,
as he has his own private scientific evidence that higher page-views
results in a higher click-though rate (hopefully not with a sample
size of one museum?).

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Features that correlate with quality (Was: Quality and pageviews)

2010-06-08 Thread John Vandenberg
On Sat, Jun 5, 2010 at 12:16 AM, Brian J Mingus
brian.min...@colorado.edu wrote:


 http://grey.colorado.edu/mediawiki/sites/mingus/images/1/1e/DeHoustMangalathMingus08_feature_table.png

Are you able to add 'no. of incoming internal links' ?

--
John Vandenberg

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Foundation-l] Wikipedia meets git

2009-10-17 Thread John Vandenberg
-- Forwarded message --
From: jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com
Date: Sun, Oct 18, 2009 at 3:39 AM
Subject: Re: [Foundation-l] Wikipedia meets git
To: Wikimedia Foundation Mailing List foundatio...@lists.wikimedia.org


see my new blogpost word leve blaming for wikipedia via git and perl ...
http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html


Next step is ready :

1. I have a single script that will pull a given article and check in
the revisions into git,
it is not perfect, but works.

http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8
you run it like this,from inside a git repo :

perl GetRevisions.pl Article_Name

git blame Article_Name/Article.xml
git push origin master

The code that splits up the line is in Process File, this splits all
spaces into newlines.
that way we get a word level blame.

    if ($insidetext)
    {
 ## split all lines on the space
 s/(\ )/\\\n/g;


 print OUT  $_;
    }


The Article is here:
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/article.xml


here are the blame results.
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/wordblame.txt


Problem is that github does not like this amount of processor power
begin used and kills the process, you can do a local git blame.

Now we have the tool to easily create a repository from wikipedia, or
any other export enabled mediawiki.

mike

___
foundation-l mailing list
foundatio...@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l