Re: [Wiki-research-l] Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices)
Dear Pine, thank you for sharing these links. I cannot read everything now, but one of these warticles was also recommended by a friend, Sure, Big Data Is Great. But So Is Intuition.http://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-forget-intuition.html?_r=1;, by Steve Lohr, that reminded me a case in Brazil which avoided previous mistakes, that was a collaborative process of hiringhttp://blog.wikimedia.org/2012/01/11/brazil-recruiting-and-partnership-with-the-community-moves-forward/a consultant for the WMF programs in Brazil, i. e., the community was listened. (See the full discussion that resulted in a better process here http://comments.gmane.org/gmane.org.wikimedia.brazil/161, although it still can improve.) It’s encouraging that thoughtful data scientists like Ms. Perlich and Ms. Schutt recognize the limits and shortcomings of the Big Data technology that they are building. Listening to the data is important, they say, but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model? As Alexandre Abdo pointed outhttp://permalink.gmane.org/gmane.org.wikimedia.brazil/358in this not so old discussion, we, the Brazilian community, were being handled as consummated facts, and the community experience and intuition was not being taken into account as it could - although I must tell a lot of efforts were done in this direction. I hope a lesson was /learned/ and this can help to the direction the organization is taking with its grantmaking and learnings. :) This also reminds me that there is no mathematical model that explains now (maybe there never will...) the kind of system Wikimedia projects deal with and sometimes lovely graphics and data interpretations are assumed as scientific statements, regardless of their scientifically underpinnings. Have a good year, Tom On Sun, Dec 30, 2012 at 1:26 AM, ENWP Pine deyntest...@hotmail.com wrote: I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in the Wikimedia movement or staff are interested in big data as it relates to Wikimedia. I hope that those who are interested in discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR practices will also find that this email interests them. Feel free to skip straight to the links in the latter portion of this email if you're already familiar with big data and its analysis and if you just want to see what other people are writing about the subject. * Introductory comments / my personal opinion Big data refers to large quantities of information that are so large that they are difficult to analyze and may not be related internally in an obvious way. See https://en.wikipedia.org/wiki/Big_data I think that most of us would agree that moving much of an organization's information into the Cloud, and/or directing people to analyze massive quantities of information, will not automatically result in better, or even good, decisions based on that information. Also, I think that most of us would agree that bigger and/or more accessible quantities of data does not necessarily imply that the data are more accurate or more relevant for a particular purpose. Another concern is the possibility of unwelcome intrusions into sensitive information, including the possibility of data breaches; imagine the possible consequences if a hacker broke into supposedly secure databases held by Facebook or the Securities and Exchange Commission. We have an enormous quantity of data on Wikimedia projects, and many ways that we can examine those data. As this Dilbert strip points out, context is important, and looking at statistics devoid of their larger contexts can be problematic. http://dilbert.com/strips/comic/1993-02-07/ Since data analysis is also something that Wikipedia does in the areas I mentioned previously, I'm passing along a few links for those who may be interested about the benefits and limitations of big data. * Links: From the Harvard Business Review http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1 From the New York Times https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-forget-intuition.html and https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html From the Wall Street Journal. This may be especially interesting to those who are participating in the discussions on Wikimedia-l regarding how Wikimedia selects, pays, and manages its staff. http://online.wsj.com/article/SB1872396390443890304578006252019616768.html And from English Wikipedia (: https://en.wikipedia.org/wiki/Big_data and https://en.wikipedia.org/wiki/Data_mining and https://en.wikipedia.org/wiki/Business_intelligence Cheers, Pine ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org
Re: [Wiki-research-l] 2012 top pageview list
The problem (as always) is that there is a difference between pages served (by the web server) and pages actually wanted and read by the user. It would be interesting to have referrer statistics. I'm guessing that many of Wikipedia pages are being referred by Google (and other general search engines). If so, people may just be clicking through a list of search results, which causes them to download a WP page but then immediately move onto the next search result because it isn't what they are looking for. I rather suspect the prominence of Facebook in the English Wikipedia results is due to this effect, as I often find myself on the Wikipedia page for Facebook instead of Facebook itself following a google search. I think the use of mobile devices (with small screens) probably encourages this sort of behaviour. Kerry -Original Message- From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Andrew G. West Sent: Sunday, 30 December 2012 2:06 PM To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] 2012 top pageview list The WMF aggregates them as (page,views) pairs on an hourly basis: http://dumps.wikimedia.org/other/pagecounts-raw/ I've been parsing these and storing them in a query-able DB format (for en.wp exclusively; though the files are available for all projects I think) for about two years. If you want to maintain such a fine granularity, it can quickly become a terrabyte scale task that eats up a lot of processing time. If your looking for more coarse granularity reports (like top views for day, week, month) a lot of efficient aggregation can be done. See also: http://en.wikipedia.org/wiki/Wikipedia:5000 Thanks, -AW On 12/28/2012 07:28 PM, John Vandenberg wrote: There is a steady stream of blogs and 'news' about these lists https://encrypted.google.com/search?client=ubuntuchannel=fsq=%22Sean+hoyla nd%22ie=utf-8oe=utf-8#q=wikipedia+top+2012hl=ensafe=offclient=ubuntutb o=dchannel=fstbm=nwssource=lnttbs=qdr:wsa=Xpsj=1ei=GzjeUOPpAsfnrAeQk4 DgCgved=0CB4QpwUoAwbav=on.2,or.r_gc.r_pw.r_cp.r_qf.bvm=bv.1355534169,d.aW Mfp=4e60e761ee133369bpcl=40096503biw=1024bih=539 How does a researcher go about obtaining access logs with useragents in order to answer some of these questions? -- Andrew G. West, Doctoral Candidate Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Website: http://www.andrew-g-west.com ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] looking for paper on the new editor experience
I think you're talking about a paper that I just finished editing American Behavioral Scientist: *The Rise and Decline of an Open Collaboration Community: **How Wikipedia's reaction to sudden popularity is causing its decline** * Summary of findings (and free to download pre-print): http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/ Official listing: http://abs.sagepub.com/content/early/2012/12/26/0002764212469365 For quick reference, here's my *TL;DR: * To deal with the massive influx of new editors between 2004 and 2007, Wikipedians built automated quality control tools and solidified their rules of governance. These reasonable and effective strategies for maintaining the quality of the encyclopedia have come at the cost of decreased retention of desirable newcomers. 1. The decline represents a change in the rate of retention of desirable, *good-faith* newcomers. - The proportion of newcomers that edit in good-faith has not changed since 2006. - These desirable newcomers are more likely to have their work rejected since 2007. - This increased rejection predicts the observed decline in retention. 2. Semi-autonomous vandal fighting tools (like Hugglehttp://en.wikipedia.org/wiki/Wikipedia:Huggle) are partially at fault. - An increasing proportion of desirable newcomers are having their work rejected by automated tools. - These automated reverts exacerbate the predicted negative effects of rejection on retention. - Users of Huggle tend to not engage in the best practiceshttp://en.wikipedia.org/wiki/WP:BRD for discussing the reverts they perform. 3. New users are being pushed out of policy articulation. - The formalized process for vetting new policies and changes to policies ensures that newcomers' edits do not survive. - Both newcomers and experienced editors are moving increasingly toward less formal spaces http://en.wikipedia.org/wiki/WP:ESSAYS. -Aaron On Wed, Jan 2, 2013 at 8:47 PM, Everton Zanella Alvarenga ezalvare...@wikimedia.org wrote: Hi, I don't know the article, but check if searching here helps http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/ http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html I don't know why I cannot use google.com with the parameter site: for this mailing list archive https://lists.wikimedia.org/mailman/listinfo/wiki-research-l. Tom On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com wrote: In the past few months, I read a paper (or a draft paper?) that I think was shared on this mailing list. Unfortunately I seem to have lost both the paper and the email (job change), so I would be grateful if anyone could send me the paper or a link or whatever. IIRC, the paper was looking at editor retention, particularly the retention of new editors. I think there were about 8 hypotheses given and some experiments conducted to test these. The one I remember most clearly was the finding that new good-faith editors were highly likely to see their contributions deleted, by either bots or more experienced editors, and this was likely to be de-motivating for them. If anyone can help with this, it would be much appreciated. Kerry ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Everton Zanella Alvarenga (also Tom) A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] CFP: TAEECE2013 - Turkey -IEEE
2013-01-02
Thread
The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE2013)
The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE2013) Konya, Turkey - May 9-11, 2013 | Mevlana University All papers will be submitted to IEEE for potential inclusion to IEEE Xplore. http://sdiwc.net/conferences/2013/taeece2013/ The conference welcome papers on the following (but not limited to) research topics: - 3D Semiconductor Device Technology - Advanced Electromagnetics - Component Technology of MEMS - Computer Engineering - Electronics System-Level Based Design - Electronics-Medical Electronics - Fiber Optics and Fiber Devices - Intelligent Transportation Systems - Software Assurance - Assembly and Packaging - Antenna and Propagation - Circuits and Electronics - Electric Energy Processing - Electro-optical Phenomena of Semiconductors - Microwave Theory and Techniques - Algorithms - Automated Software Engineering - Computer-aided Design - Computer Architecture - Computer Modeling - Computer Security - Database and Data Mining Researchers are encouraged to submit their work electronically. All papers will be fully refereed by a minimum of two specialized referees. Before final acceptance, all referees comments must be considered. Accepted papers will be published in the conference proceedings. Best papers awards will be distributed during the conference. If you have any problems with the above web submission form, you can send the manuscript by email to t...@sdiwc.net (subject: TAEECE2013 Paper Submission). Submission Deadline: March 20, 2013 Notification of Acceptance: April 20, 2013 or Before Camera Ready Submission: April 30, 2013 Registration: April 30, 2013 ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] looking for paper on the new editor experience
Found it via Google Scholar http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/ Sent from my iPad On 03/01/2013, at 12:47 PM, Everton Zanella Alvarenga ezalvare...@wikimedia.org wrote: Hi, I don't know the article, but check if searching here helps http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/ http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html I don't know why I cannot use google.com with the parameter site: for this mailing list archive https://lists.wikimedia.org/mailman/listinfo/wiki-research-l. Tom On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com wrote: In the past few months, I read a paper (or a draft paper?) that I think was shared on this mailing list. Unfortunately I seem to have lost both the paper and the email (job change), so I would be grateful if anyone could send me the paper or a link or whatever. IIRC, the paper was looking at editor retention, particularly the retention of new editors. I think there were about 8 hypotheses given and some experiments conducted to test these. The one I remember most clearly was the finding that new good-faith editors were highly likely to see their contributions deleted, by either bots or more experienced editors, and this was likely to be de-motivating for them. If anyone can help with this, it would be much appreciated. Kerry ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Everton Zanella Alvarenga (also Tom) A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] looking for paper on the new editorexperience
Thanks to all of you who replied to me on and off-list. This is indeed the paper I was looking for! Aaron, your research looked at new editors. Have you thought about undertaking a similar study about loss of seasoned editors (the beyond-newbie phase)? I note that there are a number of hypotheses on this topic http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Editor_Retention#Reasons_ editors_leave but not much evidence to provide any guidance. There was a survey sent to formerly active editors (can't find the URL) which gathered some data which indicated that, apart from personal reasons, seasoned editors primarily left because of community issues, e.g. the behaviour of other editors. So I was wondering if the use of edit logs, user contributions, etc could be used (quantitatively or qualitatively) to provide insights into patterns of behaviour (of the editor or others interacting with that editor via contributions or talk pages etc) that might provide clues into the departure of seasoned editors and/or early warning signs of someone about to walk. Kerry _ From: wiki-research-l-boun...@lists.wikimedia.org [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Thursday, 3 January 2013 1:36 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] looking for paper on the new editorexperience I think you're talking about a paper that I just finished editing American Behavioral Scientist: The Rise and Decline of an Open Collaboration Community: How Wikipedia's reaction to sudden popularity is causing its decline Summary of findings (and free to download pre-print): http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/ Official listing: http://abs.sagepub.com/content/early/2012/12/26/0002764212469365 For quick reference, here's my TL;DR: To deal with the massive influx of new editors between 2004 and 2007, Wikipedians built automated quality control tools and solidified their rules of governance. These reasonable and effective strategies for maintaining the quality of the encyclopedia have come at the cost of decreased retention of desirable newcomers. 1. The decline represents a change in the rate of retention of desirable, good-faith newcomers. * The proportion of newcomers that edit in good-faith has not changed since 2006. * These desirable newcomers are more likely to have their work rejected since 2007. * This increased rejection predicts the observed decline in retention. 2. Semi-autonomous vandal fighting tools (like Huggle http://en.wikipedia.org/wiki/Wikipedia:Huggle ) are partially at fault. * An increasing proportion of desirable newcomers are having their work rejected by automated tools. * These automated reverts exacerbate the predicted negative effects of rejection on retention. * Users of Huggle tend to not engage in the best practices http://en.wikipedia.org/wiki/WP:BRD for discussing the reverts they perform. 3. New users are being pushed out of policy articulation. * The formalized process for vetting new policies and changes to policies ensures that newcomers' edits do not survive. * Both newcomers and experienced editors are moving increasingly toward less formal http://en.wikipedia.org/wiki/WP:ESSAYS spaces. -Aaron On Wed, Jan 2, 2013 at 8:47 PM, Everton Zanella Alvarenga ezalvare...@wikimedia.org wrote: Hi, I don't know the article, but check if searching here helps http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/ http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html I don't know why I cannot use google.com with the parameter site: for this mailing list archive https://lists.wikimedia.org/mailman/listinfo/wiki-research-l. Tom On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com wrote: In the past few months, I read a paper (or a draft paper?) that I think was shared on this mailing list. Unfortunately I seem to have lost both the paper and the email (job change), so I would be grateful if anyone could send me the paper or a link or whatever. IIRC, the paper was looking at editor retention, particularly the retention of new editors. I think there were about 8 hypotheses given and some experiments conducted to test these. The one I remember most clearly was the finding that new good-faith editors were highly likely to see their contributions deleted, by either bots or more experienced editors, and this was likely to be de-motivating for them. If anyone can help with this, it would be much appreciated. Kerry ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Everton Zanella Alvarenga (also Tom) A life spent making mistakes is not only more honorable, but more useful