Re: [Wiki-research-l] 2012 top pageview list
Is favicon only in the Chinese Wikipedia top 100? It seems so, and is odd if the problem is a web browser bug. John Vandenberg. sent from Galaxy Note On Dec 28, 2012 4:07 PM, Johan Gunnarsson johan.gunnars...@gmail.com wrote: On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg jay...@gmail.com wrote: Hi Johan, Thank you for the lovely data at https://toolserver.org/~johang/2012.html I posted that link to my facebook (below if you want to join in there), and a few language specific facebook groups, and there have been some concerns raised about the results, which I'll list below. These lists are getting some traction in the press so it would be good to understand it better. http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed Cool, cool. Why is [[zh:Favicon]] #2? The data doesnt appear to support that http://stats.grok.se/zh/201201/Favicon http://stats.grok.se/zh/latest90/Favicon My post-processing filtering follows redirects to find the true title. In this case the page Favicon.ico redirects to Favicon. This is probably due to broken browsers trying to load the icon. Number 1 in French is a plant native to asia. The stats for December disagree https://en.wikipedia.org/wiki/Ilex_crenata http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9 French's Ilex_crenata redirects to Houx_crénelé. Ilex_crenata had huge traffic in April: http://stats.grok.se/fr/201204/Ilex_crenata There are a bunch of spikes like this. I can't really explain it. I talked to Domas Mituzas (the maintainer of the original dumps I use) yesterday and he suggested it might be bots going crazy for whatever reason. I'd love to filter all these false positives, but haven't been able to come up with an easy way to do it. Might be possible with access to logs with the user-agent string, but that would probably inflate the dataset size even more. It's already past the terabyte. However that could probably be solved by sampling (for example) 1/100 of the entries. Comments and ideas are welcome! Number 1 in German is Cul de sac. This is odd, but matches the stats http://stats.grok.se/de/201207/Sackgasse RIght. This one is funny. It has huge traffic on weekdays only. Deserted on weekends. Number 1 in Dutch is a Chinese mountain. The stats for December disagree http://stats.grok.se/nl/201212/Hua_Shan July/August agree: http://stats.grok.se/nl/201208/Hua_Shan Number 4 in Hebrew is zipper. The stats for December disagree http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F April agrees: http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F Number 2 in Spanish is '@'. This is odd, but matches the stats http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29 -- John Vandenberg https://www.facebook.com/johnmark.vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] 2012 top pageview list
There is a steady stream of blogs and 'news' about these lists https://encrypted.google.com/search?client=ubuntuchannel=fsq=%22Sean+hoyland%22ie=utf-8oe=utf-8#q=wikipedia+top+2012hl=ensafe=offclient=ubuntutbo=dchannel=fstbm=nwssource=lnttbs=qdr:wsa=Xpsj=1ei=GzjeUOPpAsfnrAeQk4DgCgved=0CB4QpwUoAwbav=on.2,or.r_gc.r_pw.r_cp.r_qf.bvm=bv.1355534169,d.aWMfp=4e60e761ee133369bpcl=40096503biw=1024bih=539 How does a researcher go about obtaining access logs with useragents in order to answer some of these questions? -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] 2012 top pageview list
Hi Johan, Thank you for the lovely data at https://toolserver.org/~johang/2012.html I posted that link to my facebook (below if you want to join in there), and a few language specific facebook groups, and there have been some concerns raised about the results, which I'll list below. These lists are getting some traction in the press so it would be good to understand it better. http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed Why is [[zh:Favicon]] #2? The data doesnt appear to support that http://stats.grok.se/zh/201201/Favicon http://stats.grok.se/zh/latest90/Favicon Number 1 in French is a plant native to asia. The stats for December disagree https://en.wikipedia.org/wiki/Ilex_crenata http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9 Number 1 in German is Cul de sac. This is odd, but matches the stats http://stats.grok.se/de/201207/Sackgasse Number 1 in Dutch is a Chinese mountain. The stats for December disagree http://stats.grok.se/nl/201212/Hua_Shan Number 4 in Hebrew is zipper. The stats for December disagree http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F Number 2 in Spanish is '@'. This is odd, but matches the stats http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29 -- John Vandenberg https://www.facebook.com/johnmark.vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Minor stats on Wikipedia
On Nov 1, 2012 9:28 AM, Piotr Konieczny pio...@post.pl wrote: On 10/31/2012 6:34 PM, Federico Leva (Nemo) wrote: Piotr Konieczny, 31/10/2012 23:08: Would anyone have/know where to find any of the following estimates for * of Wikipedians with a userpage http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#namespaces gives you the number of pages in User: namespace. Thanks, I thought i new this page, but I guess I didn't new it well enough. Incidentally, here's a chilling number: the average number of new editors per month in 2011 was 7,700, in 2012 it is forming up to be about 6,500. I don't like this trend at all; I thought the number of new editors was. .. Is that new _editors_ or new users? If new users, does it include SUL creations? We should expect SUL account creations to have a peak and then drop sharply once most non-English editors have visited enwp while logged in. -- John ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] War of 1812 and all that
It would be good to extend the research of War of 1812 to non-English Wikipedias. I've had a quick look and it is surprising how many of the articles 'pretty good', but none are very good. I think that there is a depth level at which non-English writers say 'I could easily add more, but the [non-English] article is good enough; if you want more detail you'll almost certainly know English language and should go read the English article. My time is better spent expanding another [non-English] article that isnt yet good enough.' John Vandenberg. sent from Galaxy Note On Oct 29, 2012 3:28 AM, Steven Walling swall...@wikimedia.org wrote: On Sun, Oct 28, 2012 at 6:19 AM, Richard Jensen rjen...@uic.edu wrote: Look at it demographically: apart from teenage boys coming of age, the population of computer-literate people who are ignorant of Wikipedia is very small indeed in 2012. That was not true in 2005 when lots of editors joined up and did a lot of work on important articles. You seem to be disregarding the entirety of the developing world and non-English speakers in that statement. -- Steven Walling https://wikimediafoundation.org/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Wikimedia Education] [WikiEN-l] [Wikimedia-l] [Wikimedia Announcements] 2012-13 Annual Plan of the Wikimedia Foundation
On Jul 31, 2012 1:43 AM, LiAnna Davis lda...@wikimedia.org wrote: Hi John, On Sun, Jul 29, 2012 at 2:39 PM, John Vandenberg jay...@gmail.com wrote: Ive asked for more info at http://meta.wikimedia.org/wiki/Research_talk:Wikipedia_Education_Program_evaluation#random_sample I did my best to answer your question there. Ive replied with more specific questions. This research was mentioned because of bold statements in the annual plan, and Tilman Bayer mentioned this blog post: https://blog.wikimedia.org/2012/04/19/wikipedia-education-program-stats-fall-2011/ Which says U.S. Education Program users are three times better than other users. -- JV ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Access2research petition = bad idea
A good example is the Queensland University of Technology Library paying the open access journal article publishing fees for their academics. Because its good business. They would rather push their researchers towards OA journals, thereby building the impact of OA journals, and meaning they can drop non-OA journals from their subscriptions. http://www.mendeley.com/research/support-gold-open-access-publishing-strategies-qut/ A practical experiment. ask your Office of VC-Research how many journal articles your university produced in 2011. Times it by USD5,000. Compare the result with your libraries journal subscription fees for 2012. The UIC library doesnt give exact numbers online, but here they give aggregate costs of the 126 ARL libraries. http://library.uic.edu/home/services/publishing-and-scholarly-communication/the-cost-of-journals If every university did that maths, with the same conclusion, they would agree that there is an enourmous saving to be had if all universities use open access. Governments and funding bodies are doing the maths, and the smart ones are forcing everyones hand by mandating OA in order to obtain funding. On Mon, May 21, 2012 at 6:30 PM, James Salsman jsals...@gmail.com wrote: Dr. Jensen, You ask who will pay for publication of journals under the open access model. Closed access journals are supported primarily by university libraries which pay subscription fees to publishers. Very rarely do the publishers pay anything to the editors and reviewers who produce the journals, but they pocket a continuously increasing profit margin, which has been increasing at about 1% per year, and currently stands at about 27%, per http://www.reedelsevier.com/mediacentre/pressreleases/2012/Pages/reed-elsevier-2011-results-announcement.aspx In order to achieve such continually increasing profit margins, publishers have been forcing price increases through bundling, which is an abuse of their monopolistic market power which lack of competition from alternative publishing models has allowed them to attain. Under the open access model, universities pay to support the publication and printing of the journals, but do not pay subscription fees. Because there is no profit margin charged, these costs are less to the university than commercial subscription fees, and the resulting readership is not limited to a tiny fraction of the population. (Because costs to the universities are less, they can keep more of the money for university official perks and salaries, tax deductible junkets for the faculty, and athletic salaries. Sadly, universities hardly ever pass any savings on to tuition payers. Every subsidy and loan guarantee supporting tuition in the postwar era has been matched by tuition increases above the cost of living, sadly, while university administrative official salaries have kept pace with CEO salaries generally, exacerbating income inequality, and increases in faculty salaries, perks, and expenses have also exceeded the inflation rate.) As you point out, this situation often results in greater charges to graduate students, unless their sponsors and grant investigators are kind enough to include the journal production fees in their department budget. How often does that happen? Your example of journals charging per-paper open access fees is an example of subtle extortion in order to cause professors such as yourself and other authors to take the position that you have, opposed to open access. Are there any reasons to the contrary? Are there any reasons that participation in such market manipulation schemes could be seen as ethical? Sincere regards, James Salsman ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] # of citations on Wikipedia?
Phoebe, Stats about {{cite journal .. }} citations can be found at http://enwp.org/wp:jcw I dont know if the parser/bot are 'free'. The bot approval is https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/JL-Bot_7 On Sat, Apr 21, 2012 at 3:31 AM, phoebe ayers phoebe.w...@gmail.com wrote: Hi all, Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this? I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details. Thanks, -- phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Xmldatadumps-l] [Wikitech-l] Fwd: Old English Wikipedia image dump from 2005
On Thu, Nov 17, 2011 at 6:40 AM, Ariel T. Glenn ar...@wikimedia.org wrote: Στις 12-11-2011, ημέρα Σαβ, και ώρα 00:31 +1100, ο/η John Vandenberg έγραψε: On Fri, Nov 11, 2011 at 11:18 PM, emijrp emi...@gmail.com wrote: Forwarding... -- Forwarded message -- From: emijrp emi...@gmail.com Date: 2011/11/11 Subject: Old English Wikipedia image dump from 2005 To: wikiteam-disc...@googlegroups.com Hi all; I want to share with you this Archive Team link[1]. It is an old English Wikipedia image dump from 2005. One of the last ones, probably, before Wikimedia Foundation stopped publishing image dumps. Enjoy. Regards, emijrp [1] http://www.archive.org/details/wikimedia-image-dump-2005-11 People interested in image dumps may be also interested in my post relating to the GFDL requirements, which I think mean images need to be included in the dumps. https://meta.wikimedia.org/w/index.php?title=Talk:Terms_of_usediff=prevoldid=3002611 excerpt: ..the [GFDL] license requires that someone can download a ''complete'' Transparent copy for one year after the last Opaque copy is distributed. As a result, I believe the BoT needs to ensure that the dumps are available ''and'' that they can be available for one year after WMF turns of the lights on the core servers (it allows 'agents' to provide this service). As Wikipedia contains images, the images are required to be included. .. discussion continues .. https://meta.wikimedia.org/wiki/Talk:Terms_of_use#Right_to_Fork I would read this as requiring access to the images to remain available, not necessarily in dump form. I dont believe that is the case. The GFDL, like the GPL, requires that it is possible to rebuild the product from the distributed source, minus any seperately distributed dependencies. It is necessary to provide a simple mechanism for reliably downloading the used images on each project and incorporating all of the dumps needed to regenerate a replica of each project. The 'source' can be broken into chunks, but it would be obviously contray to the spirit of the license to require that each and every image needs to be downloaded individually. _and_ it needs to be possible for any consumer to perform the task of obtaining the source. Does the WMF block people who attempt to mirror the project content one item at a time? IMO blocking them is very sane, but if that is the only way to obtain the source then it would again be breaking the licence. InstantCommons means that those images dont need to be redistributed in order for the projects to be compliant with the GFDL. -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Foundation-l] Summary of findings from WMF Summer of Research program now available
Thanks Steven, and the Community Department. I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-) The special page for this is dead. http://en.wikipedia.org/wiki/Special:WantedPages -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [foundation-l] Personal Image Filter results announced
The image filter survey has been covered in the latest Signpost http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2011-09-05/News_and_notes The Wikipedia editor satisfaction index may also be of interest to folk on this list. On Tue, Sep 6, 2011 at 10:18 AM, John Vandenberg jay...@gmail.com wrote: I'd love to see some expert opinion on the recent survey into Image filter. Researchers might be able to get their hands on the raw data to make sense of it all. http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en -- Forwarded message -- From: John Vandenberg jay...@gmail.com Date: Mon, Sep 5, 2011 at 9:21 AM Subject: Re: [Wikiquote-l] Personal Image Filter results announced To: foundatio...@lists.wikimedia.org On Sun, Sep 4, 2011 at 2:33 PM, Philippe Beaudette pbeaude...@wikimedia.org wrote: Ladies and Gentlemen, The committee running the vote on the features for the Personal Image Filter have released their interim report and vote count. You may see the results at http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en. Please note that the results are not final: although the vote count is, and has been finalized, the analysis of comments is ongoing. Was this survey approved by the Research Committee? If so, can they give us an opinion on the survey instrument used, whether the survey population obtained is suitable, etc? -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Foundation-l] Fwd: Wikis around Europe!
On Sun, Jun 12, 2011 at 3:59 AM, emijrp emi...@gmail.com wrote: Hi. I forward this e-mail, I hope there are people interested on this map. -- Forwarded message -- From: emijrp emi...@gmail.com Date: 2011/6/11 Subject: Wikis around Europe! To: wikiteam-disc...@googlegroups.com Hi all; A friend of mine has sent me this link about wikis (locapedias) around Europe.[1] I'm very surprised about the huge amount of wikis available. Time to archive all of them.[2] I have been working on Spanish ones. If you want to help archiving one country, please, reply to this message to coordinate. If not, I will try to archive entire Europe! Regards, emijrp [1] http://maps.google.com/maps/ms?ie=UTF8t=hmsa=0msid=115570622864617231547.00044e461c185a89b6d71ll=49.095452,14.677734spn=39.93254,79.013672z=4 [2] http://code.google.com/p/wikiteam/ very nice map. it would be nice to have these all listed on http://wikiindex.org and wikiindex could add geo information so that this map can be maintained by wikiindex. it would also be neat for wikiindex to list the date of the last wikiteam archive of that wiki, so that we can automatically work out which wikis need to be archived next. -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Summaries of recent Wikipedia research
Thank you HaeB for this valuable addition to the signpost. -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles
On Thu, Mar 17, 2011 at 8:41 AM, Chitu Okoli chitu.ok...@concordia.ca wrote: ... * A-ranked conferences in Information and Computing Sciences from http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most exhaustive journal ranking exercise I have ever found anywhere. With regards to John Lamps journal list, it is a copy of the *first* ERA journal list. http://en.wikipedia.org/wiki/Excellence_in_Research_for_Australia There is a second ERA journal list being compiled for 2012. Submissions closed yesterday, and review of ranking is now underway. The journal list can be browsed via the website. https://roci.arc.gov.au/ However there is no publicly download-able dataset available yet. If anyone wants a copy of the second ERA journal list in xml or csv, I can provide it offlist. Public consultation about the ranking is open until April 4. Unfortunately, I like you have serious questions about the face validity of these rankings; I think they heavily overrate many conferences in my own field of information systems; I assume the same is true with other fields that I don't know so well. (My primary reservation with conference or journal rankings by professors is that I strongly suspect that one of the main criteria for their rankings is whether or not they have published in that outlet before.) Unfortunately, I don't know of anything that approaches this ranking in comprehensiveness. One important point to note in regards to conferences in that journal list is that conferences are only ranked for the disciplines of * 08 Information and computer science http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/4C3249439D3285D6CA257418000470E3?opendocument * 09 Engineering http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/050A7395E86A9719CA257418000477A2?opendocument * 12 Built environment and design http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/B20002D4CAD6966DCA257418000498EA?opendocument IMO the ranked conference list was useless in the 2010 ERA process and results. I've yet to see any improvement in this area for the 2012 ERA. -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] baseline requirements for researcher permission
Hi, I've started a discussion about baseline requirements for the 'researcher' permission on English Wikipedia. http://en.wikipedia.org/wiki/Wikipedia_talk:Research#baseline_requirements_for_researcher_permission -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index
On Tue, Jul 20, 2010 at 8:06 AM, Finn Aarup Nielsen f...@imm.dtu.dk wrote: .. It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning I would say that a page called: http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning would be the way to do it. But that would never pass the deletionists. :-) French Wikipedia already has a namespace dedicated to pages about references. http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Index There is quite a bit of activity in this namespace: http://fr.wikipedia.org/w/index.php?namespace=104tagfilter=title=Sp%E9cial%3AModifications+r%E9centes English Wikipedia has a few groups of citation pages with bots that fill in the details. http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_doi http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_pmid -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] research on watchlist behaviors?
On 7/2/10, James Howison ja...@howison.name wrote: Hi all, I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc. I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for. Failing exact data, what do we know about the distribution of practices of watchlisting? Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. A better assumption is that a page is on user A's watchlist if they edit the page within 10 mins of another user editing the page. Also worth considering is the public watchlists which are created using the related changes feature. e.g. I have a separate watchlist for pages I create, as this is publicly information anyway: https://secure.wikimedia.org/wikipedia/en/wiki/Special:RecentChangesLinked/User:John_Vandenberg/New_pages wrt to the watchlist, it is only possible to know which pages are on a watchlist as of _now_, so the data would need to be snapshotted periodically in order to analyse how an individual manages their watchlist, etc. I would love to know when I added a page to my watchlist, but the schema doesn't record this information. http://www.mediawiki.org/wiki/Manual:Watchlist_table There are quite a few watchlist related bugs, which may also give you some useful information about how users want to use their watchlist, and hints into how they are currently using it. ;-) https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=watchlist -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Features that correlate with pageviews? (Was: Features that correlate with quality)
On Sat, Jun 5, 2010 at 12:16 AM, Brian J Mingus brian.min...@colorado.edu wrote: ... That is an interesting negative finding as well. Just so this thread doesn't go without some positive results, here is a table from one of my technical reports on some features that do correlate with quality. If the number is greater than zero it correlates with quality, if it is 0 it does not correlate, and if it is less than 0 it is negatively correlated with quality. The scale of the numbers is meaningless and not interpretable, although the relative magnitude is important. These are just the relative performance of each feature for each class, as extracted from the weights of a random forests classifier. http://grey.colorado.edu/mediawiki/sites/mingus/images/1/1e/DeHoustMangalathMingus08_feature_table.png Any chance you can run a similar analysis to look for correlations with page-views? I think Liam was originally looking for justification to improve article content in order for the article to attain higher page-views, as he has his own private scientific evidence that higher page-views results in a higher click-though rate (hopefully not with a sample size of one museum?). -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Features that correlate with quality (Was: Quality and pageviews)
On Sat, Jun 5, 2010 at 12:16 AM, Brian J Mingus brian.min...@colorado.edu wrote: http://grey.colorado.edu/mediawiki/sites/mingus/images/1/1e/DeHoustMangalathMingus08_feature_table.png Are you able to add 'no. of incoming internal links' ? -- John Vandenberg ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Fwd: [Foundation-l] Wikipedia meets git
-- Forwarded message -- From: jamesmikedup...@googlemail.com jamesmikedup...@googlemail.com Date: Sun, Oct 18, 2009 at 3:39 AM Subject: Re: [Foundation-l] Wikipedia meets git To: Wikimedia Foundation Mailing List foundatio...@lists.wikimedia.org see my new blogpost word leve blaming for wikipedia via git and perl ... http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html Next step is ready : 1. I have a single script that will pull a given article and check in the revisions into git, it is not perfect, but works. http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8 you run it like this,from inside a git repo : perl GetRevisions.pl Article_Name git blame Article_Name/Article.xml git push origin master The code that splits up the line is in Process File, this splits all spaces into newlines. that way we get a word level blame. if ($insidetext) { ## split all lines on the space s/(\ )/\\\n/g; print OUT $_; } The Article is here: http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/article.xml here are the blame results. http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/wordblame.txt Problem is that github does not like this amount of processor power begin used and kills the process, you can do a local git blame. Now we have the tool to easily create a repository from wikipedia, or any other export enabled mediawiki. mike ___ foundation-l mailing list foundatio...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l