[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Krinkle posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20656 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: Brakes the unit tests for upgrades under SQLite: pre DatabaseSqliteTest::testUpgrades Mismatching columns for table archive upgrading from 1.15 to 1.19alpha Failed asserting that two arrays are equal. --- Expected +++ Actual @@ @@ [8] = ar_rev_id -[9] = ar_sha1 -[10] = ar_text -[11] = ar_text_id -[12] = ar_timestamp -[13] = ar_title -[14] = ar_user -[15] = ar_user_text +[9] = ar_text +[10] = ar_text_id +[11] = ar_timestamp +[12] = ar_title +[13] = ar_user +[14] = ar_user_text ) /home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/includes/db/DatabaseSqliteTest.php:218 /home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/MediaWikiTestCase.php:64 /home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/MediaWikiPHPUnitCommand.php:20 /home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/phpunit.php:60 /pre From your previous I comment I get that they're not added yet so this may be because of that, reporting here just in case. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94291]: Revision status changed
User Hashar changed the status of MediaWiki.r94291. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94291#c0 Commit summary: Renamed image sha1 population script to be more concise ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94328]: New comment added, and revision status changed
User Aaron Schulz changed the status of MediaWiki.r94328. Old Status: new New Status: fixme User Aaron Schulz also posted a comment on MediaWiki.r94328. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94328#c20657 Commit summary: RN populateImageSha1.php renamed (r94291) Comment: Comment is backwards. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94269]: Revision status changed
User Hashar changed the status of MediaWiki.r94269. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94269#c0 Commit summary: I think 3 and a half years is long enough for the redirection to be left around ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94247]: Revision status changed
User Hashar changed the status of MediaWiki.r94247. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94247#c0 Commit summary: Fix a few comment typos noticed when doing JS review ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94236]: Revision status changed
User Hashar changed the status of MediaWiki.r94236. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94236#c0 Commit summary: Follow-up r93383: api param is 'namespace', not 'namespaces'. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[Wikitech-l] SMWCon 2011 registration now open
[All apologies for cross-posting] We are happy to announce that you can now register for SMWCon Fall 2011 Berlin, September 21–23, 2011 http://semantic-mediawiki.org/wiki/SMWCon_Fall_2011 Registration is at http://de.amiando.com/SMWCon_Fall_2011 SMWCon brings together developers, users, and organizations from the Semantic MediaWiki community in particular and everyone interested in managing data in wikis in general. The Fall 2011 event runs for three days September 21–23, 2011: * Sept 21: practical tutorials about using SMW (learn about essential aspects of using SMW) + developer consultation (meet with all developers and discuss technical questions) * Sept 22–23: community conference with talks and discussions The detailed program is about to take shape [1]. Contributions are still possible. Please note that the event takes place at the time of the famous Berlin Marathon and a visit of Pope Benedict XXI. Booking hotels soon is recommended. You can register for the whole event or for the conference days only. Registration includes lunch and coffee on all days + a conference dinner on Sept 22nd. Special subsidised rates are available for students. Moreover, MediaWiki developers are invited to join the first day (in particular the developer consultations) at a reduced rate. We are stretching ourselves to keep rates as low as possible in spite of additional costs incurred by the rooms this time. We are therefore welcoming sponsors to help back up the finances of the meeting, now and in the future. If your organisation would be interested in becoming an official supporter of the event, please contact the Open Semantic Data Association o...@semantic-mediawiki.org. SMWCon Fall 2011 is organised by the Web-Based Systems Group at Free University Berlin [2] and by MediaEvent Services [3]. Looking forward to seeing you in Berlin! Markus [1] http://semantic-mediawiki.org/wiki/SMWCon_Fall_2011 [2] http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/index.html [3] http://mediaeventservices.com/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r93751]: Revision status changed
User Krinkle changed the status of MediaWiki.r93751. Old Status: new New Status: reverted Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/93751#c0 Commit summary: Fixes Bug #29311 - [OutputPage] Create a method to remove items from mModules Patch from John Du Hart, reviewed by Roan, Applying at Roan's request. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94171]: New comment added
User Hashar posted a comment on MediaWiki.r94171. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94171#c20658 Commit summary: (bug 30219) NoLocalSettings.php broken on Windows servers. Per Tim on r70711, can't use pathinfo() on url's since the slashes don't match. Comment: You might want to add a comment above the code for later reference. Looks like it need a backport to both 1.17 and 1.18. Tagging accordingly. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93400]: Revision status changed
User Hashar changed the status of MediaWiki.r93400. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/93400#c0 Commit summary: array of objects tostring conversion works correctly in php 5.2.3+ ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93813]: New comment added
User Krinkle posted a comment on MediaWiki.r93813. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/93813#c20659 Commit summary: Don't show AFT if user is both logged out and on action=purge, because in that scenario there is no article being shown (instead, in such scenario the user sees a form with a button to clear the cache, which is then redirected back to the article (action=view). This bug was fairly rare though, since the MediaWiki interface doesn't contain any links to action=purge for logged-out users (or even logged-in users for that matter), but some gadgets and templates do link to it. Resolves bug 30100 - Hide AFT for anonymous users on purge action. Comment: It can't be replicated to ApiArticleFeedback.php as as the action is not a page or revision property, it's simply the current view of the article. Even if the API request would run in the same request context, it's still trivial to circumvent it by changing wgAction from the console or by going to a different url (eg. reading the article and rating the article there), so it's not like someone is able to rate an article that was otherwise not ratable (which is the purpose of the check in ApiArticleFeedback.php). For the same reason the original wgAction-check here wasn't in ApiArticleFeedback.php either. Thanks for the typo-catch, fixed in r94330. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94310]: Revision status changed
User Catrope changed the status of MediaWiki.r94310. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94310#c0 Commit summary: re r93565 — move unset() before count() as suggested by Roan ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93565]: Revision status changed
User Catrope changed the status of MediaWiki.r93565. Old Status: new New Status: resolved Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/93565#c0 Commit summary: Add back one insertion to the templatelinks table that was removed in the IWTransclusion merge and ended up breaking ArticleTablesTest::testbug14404 ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94329]: Revision status changed
User Catrope changed the status of MediaWiki.r94329. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94329#c0 Commit summary: Reverting r93751 per r93751 CR. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94331]: Revision status changed
User Krinkle changed the status of MediaWiki.r94331. Old Status: new New Status: fixme Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94331#c0 Commit summary: Use jQuery's $.isArray, not instanceof Array. The later has troubles with cross-frame Array instances, and doesn't use the ES5 native method. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] State of page view stats
Hi! Currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to Henrik's tool. At one per second, you're looking at over a month's worth of continuous fetching. This is obviously not practical. Or you can download raw data. A lot of people were waiting on Wikimedia's Open Web Analytics work to come to fruition, but it seems that has been indefinitely put on hold. (Is that right?) That project was pulsing with naiveness, if it ever had to be applied to wide scope of all projects ;-) Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Creating such database is easy, making it efficient is a bit different :-) And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless. My implementation is for obtaining raw data from our squid tier, what is wrong with it? Generally I had ideas of making query-able data source - it isn't impossible given a decent mix of data structures ;-) Thoughts and comments welcome on this. There's a lot of desire to have a usable system. Sure, interesting what people think could be useful with the dataset - we may facilitate it. But short of believing that in December 2010 User Datagram Protocol was more interesting to people than Julian Assange you would need some other data source to make good statistics. Yeah, lies, damn lies and statistics. We need better statistics (adjusted by wikipedian geekiness) than full page sample because you don't believe general purpose wiki articles that people can use in their work can be more popular than some random guy on the internet and trivia about him. Dracula is also more popular than Julian Assange, so is Jenna Jameson ;-) http://stats.grok.se/de/201009/Ngai.cc would be another example. Unfortunately every time you add ability to spam something, people will spam. There's also unintentional crap that ends up in HTTP requests because of broken clients. It is easy to filter that out in postprocessing, if you want, by applying article-exists bloom filter ;-) If the stats.grok.se data actually captures nearly all requests, then I am not sure you realize how low the figures are. Low they are, Wikipedia's content is all about very long tail of data, besides some heavily accessed head. Just graph top-100 or top-1000 and you will see the shape of the curve: https://docs.google.com/spreadsheet/pub?hl=en_USkey=0AtHDNfVx0WNhdGhWVlQzRXZuU2podzR2YzdCMk04MlEhl=en_USgid=1 As someone with most of the skills and resources (with the exception of time, possibly) to create a page view stats database, reading something like this makes me think... Wow. Yes, the data is susceptible to manipulation, both intentional and unintentional. I wonder how someone with most of skills and resources wants to solve this problem (besides the aforementioned article-exists filter, which could reduce dataset quite a lot ;) ... you can begin doing real analysis work. Currently, this really isn't possible, and that's a Bad Thing. Raw data allows you to do whatever analysis you want. Shove it into SPSS/R/.. ;-) Statistics much? The main bottleneck has been that, like MZMcBride mentions, an underlying database of page view data is unavailable. Underlying database is available, just not in easily queryable format. There's a distinction there, unless you all imagine database as something you send SQL to and it gives you data. Sorted files are databases too ;-) Anyway, I don't say that the project is impossible or unnecessary, but there're lots of tradeoffs to be made - what kind of real time querying workloads are to be expected, what kind of pre-filtering do people expect, etc. Of course, we could always use OWA. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94268]: New comment added, and revision status changed
User Catrope changed the status of MediaWiki.r94268. Old Status: new New Status: fixme User Catrope also posted a comment on MediaWiki.r94268. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94268#c20661 Commit summary: ajaxCategories fixes based on review in r93351 CR: * Using typeof check in clean() * Use mw.Title to get page title from fullpagename instead of split(':') * replaceNowikis() and restoreNowikis() - Improve documentation - Moved dash in the UNIQUEKEY to between the id and the incrementing integer, and made it start with an empty string (so that all following concatenations are toString'ed). * makeCaseInsensitive(): Moved the wgCaseSensitiveNamespaces-check out and wrapped it around the caller instead. Also cached the outcome of Is category namespace sensitive ?. * createButton(): text-argument is indeed text, not html. Applying html-escaping. * resolveRedirects(): - Replace access to private property _name of mw.Title with function getMainText(). * handleCategoryAdd() and handleEditLink(): - Restructure title-handling (no local replace() calls and clean(), let mw.Title handle it) - Renaming arguments and documenting them better - Renaming local variables and removing redundant parts - Preserving sortkey as sortkey as long as possible without the pipe - Calling the combination of sortkey and leading pipe 'suffix' instead of, also, sortkey. * createCatLink(): - Remove the sanitizing here, the string passed is already clean as it comes from mw.Title now - Using .text() instead of .append( which is .html-like), category names can contain special characters. * containsCat(): - Using $.each instead of [].filter. Stopping after first match. * buildRegex(): Allow whitespace before namespace colon, and allow whitespace after category name (but before ]] and |..]]) Additional changes not for any function in particular: * Literally return null in $.map callbacks. * Using the existence-system of mw.Title instead of passing around booleans everywhere ** Removed 'exists' argument from the resolveRedirects() and handleCategoryAdd() functions, instead checking .exists() of the mw.Title object. * Passing and using mw.Title objects where possible instead of converting back and forth between strings and objects etc. * Using TitleObj.getUrl() instead of catUrl( titleString ). Removed now unused catUrl() function. * To improve readability, renamed local uses of 'var that = this' to 'var ajaxcat = this'. * Syntax error fixes (.parent - .parent()) * Merging var statements * Renamed generic members of 'stash' from 'stash.summaries' to 'stash.dialogDescriptions' and 'stash.shortSum' to 'stash.editSummaries'. dialogDescription is always HTML (input should be escaped before hand) Comment: pre + // Redirect existence as well (non-existant pages can't be redirects) + mw.Title.exist.set( catTitle.toString(), true ); /pre That's wrong. You're setting the existence of the redirect '''target''', not the redirect '''itself'''. Redirects themselves can't be nonexistent, but redirect targets sure can. The logic in the entire function is backwards anyway: codeexists/code will reflect whether the redirect target exists, in case of redirect resolution. pre - // Readd static. + // Read static. /pre I'm pretty sure that wasn't a typo and was supposed to say 'readd' (as in 'add again') or maybe 're-add', but not 'read'. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94270]: Revision status changed
User Catrope changed the status of MediaWiki.r94270. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94270#c0 Commit summary: Solve undefined-message problem by removing it all together. I've moved the .containsCat() check to before the $link.length/createCatLink code, now it's always defined. (Follows-up r93351, r94268) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93351]: New comment added, and revision status changed
User Catrope changed the status of MediaWiki.r93351. Old Status: new New Status: fixme User Catrope also posted a comment on MediaWiki.r93351. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/93351#c20663 Commit summary: AjaxCategories rewrite: Solving syntax problems, performance improvements and applying code conventions: * Replaced sprite image with separate images and letting ResourceLoader embed them with @embed (@embed means 0 http requests, less maintenance, none of the known limitations with sprites, and more readable code (named files rather than pixel offsets) * Many functions were floating in the global namespace (like window.makeCaseInsensitive). A statement ends after a semi-colon(;). All functions declared after catUrl were assigned to the window object. I've instead turned the semi-colons back into comma's, merged some other var statements and moved them to the top of the closure. Changed local function declarations into function expressions for clarity. * fetchSuggestions is called by $.fn.suggestions like .call( $textbox, $textbox.val() ). So the context (this) isn't the raw element but the jQuery object, no need to re-construct with $(this) or $(that) which is slow and shouldn't even work. jQuery methods can be called on it directly. I've also replaced $(this).val() with the value-argument passed to fetchSuggestions which has this exact value already. * Adding more function documentation. And changing @since to 1.19 as this was merged from js2-branch into 1.19-trunk and new features aren't backported to 1.18. * Optimizing options/default construction to just options = $.extend( {}, options ). Caching defaultOptions is cool, but doesn't really work if it's in a context/instance local variable. Moved it up to the module closure var statements, now it's static across all instances. * In makeSuggestionBox(): Fixing invalid html fragments passed to jQuery that fail in IE. Shortcuts (like 'foo' and 'foo/') are only allowed for createElement triggers, not when creating longer fragments with content and/or attributes which are created through innerHTML, in the latter case the HTML must be completely valid and is not auto-corrected by IE. * Using more jQuery chaining where possible. * In buildRegex(): Using $.map with join( '|' ), (rather than $.each with += '|'; and substr). * Storing the init instance of mw.ajaxCategories in mw.page for reference (rather than local/anonymous). * Applied some best practices and write testable code ** Moved some of the functions created on the fly and assigned to 'this' into prototype (reference is cheaper) ** Making sure at least all 'do', 'set' and/or 'prototype' functions have a return value. Even if it's just a simple boolean true or context/this for chain-ability. ** Rewrote confirmEdit( .., .., .., ) as a prototyped method named doConfirmEdit which takes a single props-object with named valuas as argument, instead of list with 8 arguments. * Removed trailing whitespace and other minor fixes to comply with the code conventions. ** Removed space between function name and caller: foo () = foo()) ** Changing someArray.indexOf() + 1 into someArr.indexOf() !== -1. We want a Boolean here, not a Number. ** Renamed all underscore-variables to non-underscore variants. == Bug fixes == * When adding a category that is not already on the page as-is but of which the clean() version is already on the page, the script would fail. Fixed it by moving the checks up in handleCategoryAdd() and making sure that createCatLink() actually returned something. * confirmEdit() wasn't working properly and had unused code (such as submitButton), removed hidden prepending to #catlinks, no need to, it can be dialog'ed directly from the jQuery object without being somewhere in the document. * in doConfirmEdit() in submitFunction() and multiEdit: Clearing the input field after adding a category, so that when another category is being added it doesn't start with the previous value which is not allowed to be added again... Comment: blockquote pre if ( matchLineBreak ) { categoryRegex += '[ \\t\\r]*\\n?'; /pre So this could make the regex potentially match a bunch of spaces after the link, but not a line break? That's confusing. Document this if it's intended, or fix it if it's not. /blockquote This one wasn't addressed. blockquote pre /** -* Execute or queue an category add +* Execute or queue an category add. +* @param $link {jQuery} +* @param category +* @param noAppend +* @param exists +* @return {mw.ajaxCategories} /pre The behavior of this function is barely documented, specifically what it does with $link. In fact, a lot of the internal functions are undocumented, and that's only OK if they're doing trivial things. /blockquote This was addressed somewhat, but the magic that
[MediaWiki-CodeReview] [MediaWiki r94331]: New comment added
User Dantman posted a comment on MediaWiki.r94331. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94331#c20664 Commit summary: Use jQuery's $.isArray, not instanceof Array. The later has troubles with cross-frame Array instances, and doesn't use the ES5 native method. Comment: As you know, $ was already used on the line below so I just continued what was already there. Fixed this in r94332, along with a few other files that used the global jQuery instead of a locally scoped $. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94332]: New comment added
User Krinkle posted a comment on MediaWiki.r94332. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94332#c20665 Commit summary: Fix usage of the jQuery global in a few spots. - jQuery changed to $ in some files because there is a closure that creates a locally scoped $, but the jQuery var is globally scoped, meaning using jQuery instead of $ inside that closure could result in interacting with a different instance of jQuery than the uses of $ in that same closure. - In mwExtension wrap the code inside a closure which it is missing. Also take this chance to fix the whitespace style `fn( arg )` instead of `fn(arg)` on the isArray I added. This is partially a followup to r94331. Note: The jquery plugins inside the jquery/ folder look fine for use of jQuery within closures, except for mockjax. Comment: Indention in jquery.mwExtension.js is bit nasty in review but viewvc has the whitespace flag on by default http:/nowiki/svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/resources/jquery/jquery.mwExtension.js?pathrev=94332r/nowikinowiki1=94331r/nowiki2nowiki=94332/nowiki (supyeah, that took a bunch of {{tag|nowiki|o}}'s to not make CodeReview choke on it/sup. Looks good, marking ok. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94332]: Revision status changed
User Krinkle changed the status of MediaWiki.r94332. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94332#c0 Commit summary: Fix usage of the jQuery global in a few spots. - jQuery changed to $ in some files because there is a closure that creates a locally scoped $, but the jQuery var is globally scoped, meaning using jQuery instead of $ inside that closure could result in interacting with a different instance of jQuery than the uses of $ in that same closure. - In mwExtension wrap the code inside a closure which it is missing. Also take this chance to fix the whitespace style `fn( arg )` instead of `fn(arg)` on the isArray I added. This is partially a followup to r94331. Note: The jquery plugins inside the jquery/ folder look fine for use of jQuery within closures, except for mockjax. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94332]: New comment added
User Krinkle posted a comment on MediaWiki.r94332. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94332#c20666 Commit summary: Fix usage of the jQuery global in a few spots. - jQuery changed to $ in some files because there is a closure that creates a locally scoped $, but the jQuery var is globally scoped, meaning using jQuery instead of $ inside that closure could result in interacting with a different instance of jQuery than the uses of $ in that same closure. - In mwExtension wrap the code inside a closure which it is missing. Also take this chance to fix the whitespace style `fn( arg )` instead of `fn(arg)` on the isArray I added. This is partially a followup to r94331. Note: The jquery plugins inside the jquery/ folder look fine for use of jQuery within closures, except for mockjax. Comment: We may wanna poke mockjax upstream though. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94331]: Revision status changed
User Krinkle changed the status of MediaWiki.r94331. Old Status: fixme New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94331#c0 Commit summary: Use jQuery's $.isArray, not instanceof Array. The later has troubles with cross-frame Array instances, and doesn't use the ES5 native method. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94331]: Revision status changed
User Krinkle changed the status of MediaWiki.r94331. Old Status: ok New Status: resolved Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94331#c0 Commit summary: Use jQuery's $.isArray, not instanceof Array. The later has troubles with cross-frame Array instances, and doesn't use the ES5 native method. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94206]: Revision status changed
User Catrope changed the status of MediaWiki.r94206. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94206#c0 Commit summary: Add Math to make-wmf-branch Followup r85706 Relives part of the scaptrap as the 1.18 branch will have it included. Just needs enabling! ;) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94226]: Revision status changed
User Catrope changed the status of MediaWiki.r94226. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94226#c0 Commit summary: * Removed bogus parenthesis and fixed script type header * Run the l10n MW scripts on home/ as needed * Removed useless $PATH var setting * Improved reporting messages ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94245]: Revision status changed
User Catrope changed the status of MediaWiki.r94245. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94245#c0 Commit summary: Add module definition for jquery.qunit.completenessTest and set position to top for jquery.qunit (right now it's manually loaded with a script tag, but once it's loaded dynamically it should be loaded from top because of the styling and the hooks that it makes available). ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94330]: Revision status changed
User Catrope changed the status of MediaWiki.r94330. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94330#c0 Commit summary: Fix type from r93813 ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.
http://www.mediawiki.org/wiki/NOLA_Hackathon MediaWiki developers are going to meet in New Orleans, Louisiana, USA, October 14-16, 2011. Ryan Lane is putting this together and I'm helping a bit. If you're intending to come, please add your name here, just so we can start getting an idea of how many people are coming: http://www.mediawiki.org/wiki/NOLA_Hackathon#Attendees I'll add more details to the wiki page next week. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94333]: Revision status changed
User Krinkle changed the status of MediaWiki.r94333. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94333#c0 Commit summary: Fix copy-paste mistake in r94289 ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94334]: Revision status changed
User Krinkle changed the status of MediaWiki.r94334. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94334#c0 Commit summary: Use a regex when checking for external urls. It's concise and DRY, less prone to bugs like Whoops I got that hardcoded length int wrong and created a condition that'll never be true,... and it's 4 time faster ;) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94335]: Revision status changed
User Krinkle changed the status of MediaWiki.r94335. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94335#c0 Commit summary: Use [] instead of new Array. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Catrope posted a comment on MediaWiki.r94289. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94289#c20667 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: pre + AND $idCol IS NOT NULL AND {$prefix}_sha1 IS NOT NULL; /pre Since rev_sha1 and ar_sha1 are declared as NOT NULL, an IS NOT NULL condition on them is pointless. Looks good to me otherwise, but I want Chad to look at how the updater calls the population script. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[Wikitech-l] We need to make it easy to fork and leave
[posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere] THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them. This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again): * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/ I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable. Yes, we should be making ourselves forkable. That way people don't *have* to trust us. We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around. How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so? And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Hello everyone, I've actually been parsing the raw data from [http://dammit.lt/wikistats/] daily into a MySQL database for over a year now. I also store statistics at hour-granularity, whereas [stats.grok.se] stores them at day granularity, it seems. I only do this for en.wiki, and its certainly not efficient enough to open up for public use. However, I'd be willing to chat and share code with any interested developer. The strategy and schema are a bit awkward, but it works, and requires on average ~2 hours processing to store 24 hours worth of statistics. Thanks, -AW On 08/12/2011 04:49 AM, Domas Mituzas wrote: Hi! Currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to Henrik's tool. At one per second, you're looking at over a month's worth of continuous fetching. This is obviously not practical. Or you can download raw data. A lot of people were waiting on Wikimedia's Open Web Analytics work to come to fruition, but it seems that has been indefinitely put on hold. (Is that right?) That project was pulsing with naiveness, if it ever had to be applied to wide scope of all projects ;-) Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Creating such database is easy, making it efficient is a bit different :-) And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless. My implementation is for obtaining raw data from our squid tier, what is wrong with it? Generally I had ideas of making query-able data source - it isn't impossible given a decent mix of data structures ;-) Thoughts and comments welcome on this. There's a lot of desire to have a usable system. Sure, interesting what people think could be useful with the dataset - we may facilitate it. But short of believing that in December 2010 User Datagram Protocol was more interesting to people than Julian Assange you would need some other data source to make good statistics. Yeah, lies, damn lies and statistics. We need better statistics (adjusted by wikipedian geekiness) than full page sample because you don't believe general purpose wiki articles that people can use in their work can be more popular than some random guy on the internet and trivia about him. Dracula is also more popular than Julian Assange, so is Jenna Jameson ;-) http://stats.grok.se/de/201009/Ngai.cc would be another example. Unfortunately every time you add ability to spam something, people will spam. There's also unintentional crap that ends up in HTTP requests because of broken clients. It is easy to filter that out in postprocessing, if you want, by applying article-exists bloom filter ;-) If the stats.grok.se data actually captures nearly all requests, then I am not sure you realize how low the figures are. Low they are, Wikipedia's content is all about very long tail of data, besides some heavily accessed head. Just graph top-100 or top-1000 and you will see the shape of the curve: https://docs.google.com/spreadsheet/pub?hl=en_USkey=0AtHDNfVx0WNhdGhWVlQzRXZuU2podzR2YzdCMk04MlEhl=en_USgid=1 As someone with most of the skills and resources (with the exception of time, possibly) to create a page view stats database, reading something like this makes me think... Wow. Yes, the data is susceptible to manipulation, both intentional and unintentional. I wonder how someone with most of skills and resources wants to solve this problem (besides the aforementioned article-exists filter, which could reduce dataset quite a lot ;) ... you can begin doing real analysis work. Currently, this really isn't possible, and that's a Bad Thing. Raw data allows you to do whatever analysis you want. Shove it into SPSS/R/.. ;-) Statistics much? The main bottleneck has been that, like MZMcBride mentions, an underlying database of page view data is unavailable. Underlying database is available, just not in easily queryable format. There's a distinction there, unless you all imagine database as something you send SQL to and it gives you data. Sorted files are databases too ;-) Anyway, I don't say that the project is impossible or unnecessary, but there're lots of tradeoffs to be made - what kind of real time querying workloads are to be expected, what kind of pre-filtering do people expect, etc. Of course, we could always use OWA. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Andrew G. West, Doctoral Student Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Email: west...@cis.upenn.edu Website: http://www.cis.upenn.edu/~westand ___ Wikitech-l mailing list
[MediaWiki-CodeReview] [MediaWiki r94268]: New comment added, and revision status changed
User Krinkle changed the status of MediaWiki.r94268. Old Status: fixme New Status: new User Krinkle also posted a comment on MediaWiki.r94268. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94268#c20668 Commit summary: ajaxCategories fixes based on review in r93351 CR: * Using typeof check in clean() * Use mw.Title to get page title from fullpagename instead of split(':') * replaceNowikis() and restoreNowikis() - Improve documentation - Moved dash in the UNIQUEKEY to between the id and the incrementing integer, and made it start with an empty string (so that all following concatenations are toString'ed). * makeCaseInsensitive(): Moved the wgCaseSensitiveNamespaces-check out and wrapped it around the caller instead. Also cached the outcome of Is category namespace sensitive ?. * createButton(): text-argument is indeed text, not html. Applying html-escaping. * resolveRedirects(): - Replace access to private property _name of mw.Title with function getMainText(). * handleCategoryAdd() and handleEditLink(): - Restructure title-handling (no local replace() calls and clean(), let mw.Title handle it) - Renaming arguments and documenting them better - Renaming local variables and removing redundant parts - Preserving sortkey as sortkey as long as possible without the pipe - Calling the combination of sortkey and leading pipe 'suffix' instead of, also, sortkey. * createCatLink(): - Remove the sanitizing here, the string passed is already clean as it comes from mw.Title now - Using .text() instead of .append( which is .html-like), category names can contain special characters. * containsCat(): - Using $.each instead of [].filter. Stopping after first match. * buildRegex(): Allow whitespace before namespace colon, and allow whitespace after category name (but before ]] and |..]]) Additional changes not for any function in particular: * Literally return null in $.map callbacks. * Using the existence-system of mw.Title instead of passing around booleans everywhere ** Removed 'exists' argument from the resolveRedirects() and handleCategoryAdd() functions, instead checking .exists() of the mw.Title object. * Passing and using mw.Title objects where possible instead of converting back and forth between strings and objects etc. * Using TitleObj.getUrl() instead of catUrl( titleString ). Removed now unused catUrl() function. * To improve readability, renamed local uses of 'var that = this' to 'var ajaxcat = this'. * Syntax error fixes (.parent - .parent()) * Merging var statements * Renamed generic members of 'stash' from 'stash.summaries' to 'stash.dialogDescriptions' and 'stash.shortSum' to 'stash.editSummaries'. dialogDescription is always HTML (input should be escaped before hand) Comment: Fixed in r94338. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93351]: New comment added
User Krinkle posted a comment on MediaWiki.r93351. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/93351#c20669 Commit summary: AjaxCategories rewrite: Solving syntax problems, performance improvements and applying code conventions: * Replaced sprite image with separate images and letting ResourceLoader embed them with @embed (@embed means 0 http requests, less maintenance, none of the known limitations with sprites, and more readable code (named files rather than pixel offsets) * Many functions were floating in the global namespace (like window.makeCaseInsensitive). A statement ends after a semi-colon(;). All functions declared after catUrl were assigned to the window object. I've instead turned the semi-colons back into comma's, merged some other var statements and moved them to the top of the closure. Changed local function declarations into function expressions for clarity. * fetchSuggestions is called by $.fn.suggestions like .call( $textbox, $textbox.val() ). So the context (this) isn't the raw element but the jQuery object, no need to re-construct with $(this) or $(that) which is slow and shouldn't even work. jQuery methods can be called on it directly. I've also replaced $(this).val() with the value-argument passed to fetchSuggestions which has this exact value already. * Adding more function documentation. And changing @since to 1.19 as this was merged from js2-branch into 1.19-trunk and new features aren't backported to 1.18. * Optimizing options/default construction to just options = $.extend( {}, options ). Caching defaultOptions is cool, but doesn't really work if it's in a context/instance local variable. Moved it up to the module closure var statements, now it's static across all instances. * In makeSuggestionBox(): Fixing invalid html fragments passed to jQuery that fail in IE. Shortcuts (like 'foo' and 'foo/') are only allowed for createElement triggers, not when creating longer fragments with content and/or attributes which are created through innerHTML, in the latter case the HTML must be completely valid and is not auto-corrected by IE. * Using more jQuery chaining where possible. * In buildRegex(): Using $.map with join( '|' ), (rather than $.each with += '|'; and substr). * Storing the init instance of mw.ajaxCategories in mw.page for reference (rather than local/anonymous). * Applied some best practices and write testable code ** Moved some of the functions created on the fly and assigned to 'this' into prototype (reference is cheaper) ** Making sure at least all 'do', 'set' and/or 'prototype' functions have a return value. Even if it's just a simple boolean true or context/this for chain-ability. ** Rewrote confirmEdit( .., .., .., ) as a prototyped method named doConfirmEdit which takes a single props-object with named valuas as argument, instead of list with 8 arguments. * Removed trailing whitespace and other minor fixes to comply with the code conventions. ** Removed space between function name and caller: foo () = foo()) ** Changing someArray.indexOf() + 1 into someArr.indexOf() !== -1. We want a Boolean here, not a Number. ** Renamed all underscore-variables to non-underscore variants. == Bug fixes == * When adding a category that is not already on the page as-is but of which the clean() version is already on the page, the script would fail. Fixed it by moving the checks up in handleCategoryAdd() and making sure that createCatLink() actually returned something. * confirmEdit() wasn't working properly and had unused code (such as submitButton), removed hidden prepending to #catlinks, no need to, it can be dialog'ed directly from the jQuery object without being somewhere in the document. * in doConfirmEdit() in submitFunction() and multiEdit: Clearing the input field after adding a category, so that when another category is being added it doesn't start with the previous value which is not allowed to be added again... Comment: The categoryRegex in the codematchLineBreak/code condition remained unchanged since I'm terrible at regexes and don't know a lot about how the Parser handles category links, – I left the regex the way it was introduced by mdale. I understand the regex to recognize the issue, but don't know how to fix it. Same goes for codenewText = oldText.replace( categoryRegex, newCategoryString );/code. I didn't change the event handler a lot because (together with the culture of triggers events to perform actions and using DOM-inspection to get information), I'd like to get rid of that all together and instead store it in JavaScript (perhaps wgCategories or a local copy of it). That would also make containCat() a lot nicer. Events would then just call the API rather than the events ''being'' the API. This will likely require another major refactor of this module, which I don't have time for right now but would love to do later (I don't think it's a requirement for the module though,
Re: [Wikitech-l] We need to make it easy to fork and leave
On 12/08/2011 8:55 PM, David Gerard wrote: THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them. I have an idea that might be practical and go some way toward solving your problem. Wikipedia is an impressive undertaking, and as you mentioned on your blog it has become part of the background as a venerable institution, however it is still dwarfed by the institution that is the World Wide Web (which, by the way, runs on web-standards like HTML5 :). To give a little context concerning the start of the art, a bit over a week ago I decided to start a club. Within a matter of days I had a fully functioning web-site for my club, with two CRM systems (a wiki and a blog), and a number of other administrative facilities, all due to the power and availability of open-source software. As time goes by there are only going to be more, not less, people like me. People who have the capacity to run their own content management systems out of their own garages (mine's actually in a slicehost.net datacenter, but it *used* to be in my garage, and by rights it could be, except that I don't actually *have* a garage any more, but that's another story). The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia: http://www.progclub.org/wiki/Network_administration#Links Links such as: http://en.wikipedia.org/wiki/Subversion Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g. http://www.progclub.org/wiki/Wikepedia:Subversion There could then be a facility in place to periodically update the mirrored copies in my own system. Attribution for these copies would be given to a 'system user', such as the 'Interwiki Update Service'. The edit history for the version on my system would only show versions for each time the update service had updated the content. Links for the 'edit' button could be wired up so that when someone tried to edit, http://www.progclub.org/wiki/Wikipedia:Subversion on my server, they were redirected to the Wikipedia edit facility, assuming that such a facility was still available. In the case that Wikipedia was no more, it would be possible to turn off mirroring, and in that case the 'edit' facility would allow for edits of the local content. That's probably a far more practical approach to take than say, something like distributing the entire English database via BitTorrent. By all means do that too, but I'd suggest that if you're looking for an anarchically-scalable distributed hypermedia solution, you won't have to look much past the web. John. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] We need to make it easy to fork and leave
On Fri, Aug 12, 2011 at 6:55 AM, David Gerard dger...@gmail.com wrote: [posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere] THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them. This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again): * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/ I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable. Yes, we should be making ourselves forkable. That way people don't *have* to trust us. We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around. How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so? Software and customizations are pretty easy -- that's all in SVN, and most of the config files are also made visible on noc.wikimedia.org. If you're running a large site there'll be more 'tips and tricks' in the actual setup that you may need to learn; most documentation on the setups should be on wikitech.wikimedia.org, and do feel free to ask for details on anything that might seem missing -- it should be reasonably complete. But to just keep a data set, it's mostly a matter of disk space, bandwidth, and getting timely updates. For data there are three parts: * page data -- everything that's not deleted/oversighted is in the public dumps at download.wikimedia.org, but may be a bit slow to build/process due to the dump system's history; it doesn't scale as well as we really want with current data size. More to the point, getting data isn't enough for a working fork - a wiki without a community is an empty thing, so being able to move data around between different sites (merging changes, distributing new articles) would be a big plus. This is a bit awkward with today's MediaWiki (though I tjimk I've seen some exts aiming to help); DVCSs like git show good ways to do this sort of thing -- forking a project on/from a git hoster like github or gitorious is usually the first step to contributing upstream! This is healthy and should be encouraged for wikis, too. * media files -- these are freely copiable but I'm not sure the state of easily obtaing them in bulk. As the data set moved into TB it became impractical to just build .tar dumps. There are batch downloader tools available, and the metadata's all in dumps and api. * user data -- watchlists, emails, passwords, prefs are not exported in bulk, but you can always obtain your own info so an account migration tool would not be hard to devise. And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work. This is part of WMF's purpose, actually, so I'll disagree on that point. That's why for instance we insist on using so much open source -- we *want* everything we do to be able to be reused or rebuilt independently of us. -- brion - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Anyway, I don't say that the project is impossible or unnecessary, but there're lots of tradeoffs to be made - what kind of real time querying workloads are to be expected, what kind of pre-filtering do people expect, etc. I could be biased here, but I think the canonical use case for someone seeking page view information would be viewing page view counts for a set of articles -- most times a single article, but also multiple articles -- over an arbitrary time range. Narrowing that down, I'm not sure whether the level of demand for real-time data (say, for the previous hour) would be higher than the demand for fast query results for more historical data. Would these two workloads imply the kind of trade-off you were referring to? If not, could you give some examples of what kind of expected workloads/use cases would entail such trade-offs? If ordering pages by page view count for a given time period would imply such a tradeoff, then I think it'd make sense to deprioritize page ordering. I'd be really interested to know your thoughts on an efficient schema for organizing the raw page view data in the archives at http://dammit.lt/wikistats/. Thanks, Eric ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94328]: Revision status changed
User Hashar changed the status of MediaWiki.r94328. Old Status: fixme New Status: resolved Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94328#c0 Commit summary: RN populateImageSha1.php renamed (r94291) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r84278]: New comment added
User Bryan posted a comment on MediaWiki.r84278. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/84278#c20670 Commit summary: Added hook BitmapHandlerTransform to allow extension to transform a file without overriding the entire handler. Comment: mto=MediaTransformOutput. Perhaps better is Hook to BitmapHandlerTransform created a thumbnail ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] Private Mode questions
- Original Message - From: Platonides platoni...@gmail.com People usually set it to 5000 and use the 'search in this page' feature of their browser. Which is far from convenient. For a 2 years old bug requesting that needed feature, see https://bugzilla.wikimedia.org/show_bug.cgi?id=20858 Noted. Though I tend, myself, not to get the 'suggested implementation of fix' quite so tangled up in the bug report. How are those messages *implemented*, internally? Are they in a page namespace not exposed to the standard system text search? Or are they just hardwired in somehow? Cheers, -- jra -- Jay R. Ashworth Baylink j...@baylink.com Designer The Things I Think RFC 2100 Ashworth Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94309]: Revision status changed
User ^demon changed the status of MediaWiki.r94309. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94309#c0 Commit summary: w/s ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] We need to make it easy to fork and leave
On 12 August 2011 12:44, Brion Vibber br...@pobox.com wrote: On Fri, Aug 12, 2011 at 6:55 AM, David Gerard dger...@gmail.com wrote: And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration fo r long-term preservation of all this work. This is part of WMF's purpose, actually, so I'll disagree on that point. That's why for instance we insist on using so much open source -- we *want* everything we do to be able to be reused or rebuilt independently of us. I'm speaking of making it happen, not whether it's an acknowledged need, which I know it is :-) It's an obvious Right Thing. But we have X dollars to do everything with, so more to this means less to somewhere else. And this is a variety of technical debt, and tends to get put in an eternal to-do list with the rest of the technical debt. So it would need someone actively pushing it. I'm not even absolutely sure myself it's a priority item that someone should take up as a cause. I do think the communities need reminding of it from time to time, however. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] We need to make it easy to fork and leave
On 12 August 2011 12:44, Brion Vibber br...@pobox.com wrote: * user data -- watchlists, emails, passwords, prefs are not exported in bulk, but you can always obtain your own info so an account migration tool would not be hard to devise. This one's tricky, because that's not free content, for good reason. It would need to be present for correct attribution at the least. I don't see anything intrinsically hard about that - have I missed anything about it that makes it hard? - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] We need to make it easy to fork and leave
On 12/08/2011 10:31 PM, David Gerard wrote: This one's tricky, because that's not free content, for good reason. It would need to be present for correct attribution at the least. I don't see anything intrinsically hard about that - have I missed anything about it that makes it hard? Well you'd need to have namespaces for username's, and that's about it. Or you could pursue something like OpenID as you mentioned. Of course if you used the user database as is and pursued my proposed model for content mirroring, you could have an 'Attribution' tab for mirrored content up near the 'Page' and 'Discussion' tabs, and in that page show a list of everyone who had contributed to the content. You could update this list from time-to-time, at the same time as you did your mirroring. You could go as far as mentioning the number of edits particular users had made. It wouldn't be the same type of blow by blow attribution that you get where you can see a log of specifically what contributions particular users had made, but it would be a suitable attribution nonetheless, similar to the attribution at: http://en.wikipedia.org/wiki/Special:Version ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94338]: Revision status changed
User Catrope changed the status of MediaWiki.r94338. Old Status: new New Status: ok Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94338#c0 Commit summary: more ajaxCategories fixes based on review in r93351 CR * Html-escaping unescaped message in summaryHolder * Check for errors in the API response * Pass true for existence of redirect origin and value of 'exists' for target (instead of backwards) * Comment fixes ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] We need to make it easy to fork and leave
On 12/08/2011 10:44 PM, John Elliot wrote: It wouldn't be the same type of blow by blow attribution that you get where you can see a log of specifically what contributions particular users had made Although I guess it would be possible to go all out and support that too. You could leave the local user database as-is, and introduce a remote user database that included a namespace, such as en.wikipedia.org, for usernames. For mirrored content you'd reference the remote user database, and for local content reference the local user database. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94277]: Revision status changed
User Bryan changed the status of MediaWiki.r94277. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94277#c0 Commit summary: Fix Bug #30322 “SVG metadata is read incorrectly” by applying supplied patch ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added, and revision status changed
User Krinkle changed the status of MediaWiki.r94289. Old Status: new New Status: fixme User Krinkle also posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20671 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: Running MySQL locally. Geting this on wiki pages: pre A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function Revision::fetchFromConds. Database returned error 1054: Unknown column 'rev_sha1' in 'field list' (127.0.0.1). /pre And from update.php: pre Populating rev_len column ...doing rev_id from 1 to 200 A database query syntax error has occurred. The last attempted database query was: SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1 FROM `revision` WHERE (rev_id = 1) AND (rev_id = 200) AND (rev_len IS NULL) from within function PopulateRevisionLength::execute. Database returned error 1054: Unknown column 'rev_sha1' in 'field list' (127.0.0.1) /pre ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Krinkle posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20672 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: So to populate this hash column, the maintenance script would have to fetch the raw wikitext of all public and deleted revisions of all wikis. That's going to open up a lot of possibilities (not sure how useful it is, but more re-using of text_oldids comes to mind when an edit results in the page text being equal to an earlier revision), but I'm curious how long that would take though on WMF. Weeks ? ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94345]: Revision status changed
User ^demon changed the status of MediaWiki.r94345. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94345#c0 Commit summary: Follow-up r94289: SQLite support, unbreaks tests ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94344]: Revision status changed
User ^demon changed the status of MediaWiki.r94344. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94344#c0 Commit summary: Stylize ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94212]: New comment added, and revision status changed
User Bryan changed the status of MediaWiki.r94212. Old Status: new New Status: fixme User Bryan also posted a comment on MediaWiki.r94212. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94212#c20673 Commit summary: (bug 30192) Old thumbnails not properly purged. Unlike the bug suggests, we don't need to also purge from LocalFile::purgeCache(), since that code path ends up calling purgeHistory() anyway. A lot of this could probably be protected...not much calls these outside of FileRepo code other than File::purgeCache() Comment: purgeHistory() is not actually called by File::purgeEverything(). Also the function description of LocalFile::purgeMetadataCache() is not valid anymore. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94339]: Revision status changed
User ^demon changed the status of MediaWiki.r94339. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94339#c0 Commit summary: RN populateSha1.php renamed (r94291) (fix r94328) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User ^demon posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20675 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: Updater is fine with r94345. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Catrope posted a comment on MediaWiki.r94289. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94289#c20676 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: We figured out why on IRC: rev_len isn't populated yet on his wiki, so the rev_len population script runs before rev_sha1 is added, and barfs. Chad says: Really, populateRevLen should be moved to $postUpdateMaintenance. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Krinkle posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20677 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: For what it's worth, that column does exist on my wiki and all rows in the revision table have a value in that column. According to Roan, the reason it's trying to update is because it's not in updatelog. My wiki runs trunk/phase3, revision: 94346 (first installed about 2 weeks ago). ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94303]: New comment added
User Jack Phoenix posted a comment on MediaWiki.r94303. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94303#c20678 Commit summary: fix for Bug 29520 - Ability to turn off images on mobile and wap-mobile page views Comment: pre - $regularSite = self::$messages['mobile-frontend-regular-site']; - $permStopRedirect = self::$messages['mobile-frontend-perm-stop-redirect']; - $copyright = self::$messages['mobile-frontend-copyright']; - $homeButton = self::$messages['mobile-frontend-home-button']; - $randomButton = self::$messages['mobile-frontend-random-button']; - $areYouSure = self::$messages['mobile-frontend-are-you-sure']; - $explainDisable = self::$messages['mobile-frontend-explain-disable']; - $disableButton = self::$messages['mobile-frontend-disable-button']; - $backButton = self::$messages['mobile-frontend-back-button']; + $regularSite= self::$messages['mobile-frontend-regular-site']; + $permStopRedirect = self::$messages['mobile-frontend-perm-stop-redirect']; + $copyright = self::$messages['mobile-frontend-copyright']; + $homeButton = self::$messages['mobile-frontend-home-button']; + $randomButton = self::$messages['mobile-frontend-random-button']; + $areYouSure = self::$messages['mobile-frontend-are-you-sure']; + $explainDisable = self::$messages['mobile-frontend-explain-disable']; + $disableButton = self::$messages['mobile-frontend-disable-button']; + $backButton = self::$messages['mobile-frontend-back-button']; + $disableImages = self::$messages['mobile-frontend-disable-images']; /pre [[Manual:Coding conventions#Vertical alignment]] recommends doing vertical alignment with spaces instead of tabs, but I think that more important is the recommendation that is stated on that manual page, too: '''Avoid vertical alignment'''. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.
Sumana Harihareswara wrote: http://www.mediawiki.org/wiki/NOLA_Hackathon MediaWiki developers are going to meet in New Orleans, Louisiana, USA, October 14-16, 2011. Ryan Lane is putting this together and I'm helping a bit. If you're intending to come, please add your name here, just so we can start getting an idea of how many people are coming: http://www.mediawiki.org/wiki/NOLA_Hackathon#Attendees I'll add more details to the wiki page next week. It might be nice to couple this with a New Orleans wikimeetup. I have no idea if New Orleans has meetups already, but the D.C. hackathon coupled with a meetup and it seemed to work out pretty well. Social interaction and direct user contact is never a bad thing for developers. ;-) MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Andrew G. West wrote: I've actually been parsing the raw data from [http://dammit.lt/wikistats/] daily into a MySQL database for over a year now. I also store statistics at hour-granularity, whereas [stats.grok.se] stores them at day granularity, it seems. I only do this for en.wiki, and its certainly not efficient enough to open up for public use. However, I'd be willing to chat and share code with any interested developer. The strategy and schema are a bit awkward, but it works, and requires on average ~2 hours processing to store 24 hours worth of statistics. I'd certainly be interested in seeing the code and database schema you've written, if only as a point of reference and to learn from any bugs/issues/etc. that you've encountered along the way. Is it possible for you to post the code you're using somewhere? MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.
On Fri, Aug 12, 2011 at 4:25 PM, MZMcBride z...@mzmcbride.com wrote: It might be nice to couple this with a New Orleans wikimeetup. I have no idea if New Orleans has meetups already, but the D.C. hackathon coupled with a meetup and it seemed to work out pretty well. Social interaction and direct user contact is never a bad thing for developers. ;-) Context: at last year's D.C. hackathon, we joined the D.C. meetup on Saturday night. Basically, the meetup was a dinner at a restaurant with ~10 local Wikimedians, and we kind of took over the whole thing with ~25 developers :D I second this notion, meeting up with local Wikimedians in some way, maybe even inviting them to hang around the venue even if they're not coders, sounds like a great idea. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Domas Mituzas wrote: Hi! Hi! Currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to Henrik's tool. At one per second, you're looking at over a month's worth of continuous fetching. This is obviously not practical. Or you can download raw data. Downloading gigs and gigs of raw data and then processing it is generally more impractical for end-users. Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Creating such database is easy, making it efficient is a bit different :-) Any tips? :-) My thoughts were that the schema used by the GlobalUsage extension might be reusable here (storing wiki, page namespace ID, page namespace name, and page title). And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless. My implementation is for obtaining raw data from our squid tier, what is wrong with it? Generally I had ideas of making query-able data source - it isn't impossible given a decent mix of data structures ;-) Well, more documentation is always a good thing. I'd start there. As I recall, the system of determining which domain a request went to is a bit esoteric and it might be the worth the cost to store the whole domain name in order to cover edge cases (labs wikis, wikimediafoundation.org, *.wikimedia.org, etc.). There's some sort of distinction between projectcounts and pagecounts (again with documentation) that could probably stand to be eliminated or simplified. But the biggest improvement would be post-processing (cleaning up) the source files. Right now if there are anomalies in the data, every re-user is expected to find and fix these on their own. It's _incredibly_ inefficient for everyone to adjust the data (for encoding strangeness, for bad clients, for data manipulation, for page existence possibly, etc.) rather than having the source files come out cleaner. I think your first-pass was great. But I also think it could be improved. :-) As someone with most of the skills and resources (with the exception of time, possibly) to create a page view stats database, reading something like this makes me think... Wow. I meant that it wouldn't be very difficult to write a script to take the raw data and put it into a public database on the Toolserver (which probably has enough hardware resources for this project currently). It's maintainability and sustainability that are the bigger concerns. Once you create a public database for something like this, people will want it to stick around indefinitely. That's quite a load to take on. I'm also likely being incredibly naïve, though I did note somewhere that it wouldn't be a particularly small undertaking to do this project well. Yes, the data is susceptible to manipulation, both intentional and unintentional. I wonder how someone with most of skills and resources wants to solve this problem (besides the aforementioned article-exists filter, which could reduce dataset quite a lot ;) I'd actually say that having data for non-existent pages is a feature, not a bug. There's potential there to catch future redirects and new pages, I imagine. ... you can begin doing real analysis work. Currently, this really isn't possible, and that's a Bad Thing. Raw data allows you to do whatever analysis you want. Shove it into SPSS/R/.. ;-) Statistics much? A user wants to analyze a category with 100 members for the page view data of each category member. You think it's a Good Thing that the user has to first spend countless hours processing gigabytes of raw data in order to do that analysis? It's a Very Bad Thing. And the people who are capable of doing analysis aren't always the ones capable of writing the scripts and the schemas necessary to get the data into a usable form. The main bottleneck has been that, like MZMcBride mentions, an underlying database of page view data is unavailable. Underlying database is available, just not in easily queryable format. There's a distinction there, unless you all imagine database as something you send SQL to and it gives you data. Sorted files are databases too ;-) The reality is that a large pile of data that's not easily queryable is directly equivalent to no data at all, for most users. Echoing what I said earlier, it doesn't make much sense for people to be continually forced to reinvent the wheel (post-processing raw data and putting it into a queryable format). MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.
Roan Kattouw wrote: On Fri, Aug 12, 2011 at 4:25 PM, MZMcBride z...@mzmcbride.com wrote: It might be nice to couple this with a New Orleans wikimeetup. I have no idea if New Orleans has meetups already, but the D.C. hackathon coupled with a meetup and it seemed to work out pretty well. Social interaction and direct user contact is never a bad thing for developers. ;-) Context: at last year's D.C. hackathon, we joined the D.C. meetup on Saturday night. Basically, the meetup was a dinner at a restaurant with ~10 local Wikimedians, and we kind of took over the whole thing with ~25 developers :D I second this notion, meeting up with local Wikimedians in some way, maybe even inviting them to hang around the venue even if they're not coders, sounds like a great idea. http://en.wikipedia.org/wiki/Wikipedia:Meetup/DC_12 was the wikimeetup coordination page, for reference. MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94351]: New comment added
User Krinkle posted a comment on MediaWiki.r94351. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94351#c20679 Commit summary: merge r94350 Comment: Please don't backport before code review. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94351]: New comment added
User SPQRobin posted a comment on MediaWiki.r94351. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94351#c20680 Commit summary: merge r94350 Comment: But I was thinking it would be better to backport now so we don't forget it later on. Or should I instead add a tag to the revision that needs backporting after review? ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] State of page view stats
Note that to avoid too much traffic here, I've responded to MZMcBride privately with my code. I'd be happy to share my code with others, and include others in its discussion -- just contact me/us privately. Thanks, -AW On 08/12/2011 10:30 AM, MZMcBride wrote: Andrew G. West wrote: I've actually been parsing the raw data from [http://dammit.lt/wikistats/] daily into a MySQL database for over a year now. I also store statistics at hour-granularity, whereas [stats.grok.se] stores them at day granularity, it seems. I only do this for en.wiki, and its certainly not efficient enough to open up for public use. However, I'd be willing to chat and share code with any interested developer. The strategy and schema are a bit awkward, but it works, and requires on average ~2 hours processing to store 24 hours worth of statistics. I'd certainly be interested in seeing the code and database schema you've written, if only as a point of reference and to learn from any bugs/issues/etc. that you've encountered along the way. Is it possible for you to post the code you're using somewhere? MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Andrew G. West, Doctoral Student Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Email: west...@cis.upenn.edu Website: http://www.cis.upenn.edu/~westand ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Downloading gigs and gigs of raw data and then processing it is generally more impractical for end-users. You were talking about 3.7M articles. :) It is way more practical than working with pointwise APIs though :-) Any tips? :-) My thoughts were that the schema used by the GlobalUsage extension might be reusable here (storing wiki, page namespace ID, page namespace name, and page title). I don't know what GlobalUsage does, but probably it is all wrong ;-) As I recall, the system of determining which domain a request went to is a bit esoteric and it might be the worth the cost to store the whole domain name in order to cover edge cases (labs wikis, wikimediafoundation.org, *.wikimedia.org, etc.). *shrug*, maybe, if I'd run a second pass I'd aim for cache oblivious system with compressed data both on-disk and in-cache (currently it is b-tree with standard b-tree costs). Then we could actually store more data ;-) Do note, there're _lots_ of data items, and increasing per-item cost may quadruple resource usage ;-) Otoh, expanding project names is straightforward, if you know how). There's some sort of distinction between projectcounts and pagecounts (again with documentation) that could probably stand to be eliminated or simplified. projectcounts are aggregated by project, pagecounts are aggregated by page. If you looked at data it should be obvious ;-) And yes, probably best documentation was in some email somewhere. I should've started a decent project with descriptions and support and whatever. Maybe once we move data distribution back into WMF proper, there's no need for it to live nowadays somewhere in Germany. But the biggest improvement would be post-processing (cleaning up) the source files. Right now if there are anomalies in the data, every re-user is expected to find and fix these on their own. It's _incredibly_ inefficient for everyone to adjust the data (for encoding strangeness, for bad clients, for data manipulation, for page existence possibly, etc.) rather than having the source files come out cleaner. Raw data is fascinating in that regard though - one can see what are bad clients, what are anomalies, how they encode titles, what are erroneus titles, etc. There're zillions of ways to do post-processing, and none of these will match all needs of every user. I think your first-pass was great. But I also think it could be improved. :-) Sure, it can be improved in many ways, including more data (some people ask (page,geography) aggregations, though with our long tail that is huge dataset growth ;-) I meant that it wouldn't be very difficult to write a script to take the raw data and put it into a public database on the Toolserver (which probably has enough hardware resources for this project currently). I doubt Toolserver has enough resources to have this data thrown at it and queried more, unless you simplify needs a lot. There's 5G raw uncompressed data per day in text form, and long tail makes caching quite painful, unless you go for cache oblivious methods. It's maintainability and sustainability that are the bigger concerns. Once you create a public database for something like this, people will want it to stick around indefinitely. That's quite a load to take on. I'd love to see that all the data is preserved infinitely. It is one of most interesting datasets around, and its value for the future is quite incredible. I'm also likely being incredibly naïve, though I did note somewhere that it wouldn't be a particularly small undertaking to do this project well. Well, initial work took few hours ;-) I guess by spending few more hours we could improve that, if we really knew what we want. I'd actually say that having data for non-existent pages is a feature, not a bug. There's potential there to catch future redirects and new pages, I imagine. That is one of reasons we don't eliminate that data now from raw dataset. I don't see it as a bug, I just see that for long-term aggregations that data could be omitted. A user wants to analyze a category with 100 members for the page view data of each category member. You think it's a Good Thing that the user has to first spend countless hours processing gigabytes of raw data in order to do that analysis? It's a Very Bad Thing. And the people who are capable of doing analysis aren't always the ones capable of writing the scripts and the schemas necessary to get the data into a usable form. No, I think we should have API to that data to fetch small sets of data without much pain. The reality is that a large pile of data that's not easily queryable is directly equivalent to no data at all, for most users. Echoing what I said earlier, it doesn't make much sense for people to be continually forced to reinvent the wheel (post-processing raw data and putting it into a queryable format). I agree. By opening up the dataset I expected others to build upon
[MediaWiki-CodeReview] [MediaWiki r94351]: New comment added
User Catrope posted a comment on MediaWiki.r94351. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94351#c20681 Commit summary: merge r94350 Comment: Yes, tag it with '1.18' and someone will merge it after it's been reviewed. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94333]: New comment added
User Aaron Schulz posted a comment on MediaWiki.r94333. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94333#c20682 Commit summary: Fix copy-paste mistake in r94289 Comment: Odd that I changed 'rev_' to 'ar_' but didn't notice the other stuff :) ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r93063]: Revision status changed
User Krinkle changed the status of MediaWiki.r93063. Old Status: new New Status: fixme Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/93063#c0 Commit summary: mw.user.js: Make sessionId public ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94289]: New comment added
User Aaron Schulz posted a comment on MediaWiki.r94289. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289#c20683 Commit summary: * Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312) * Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though) Comment: NULL check fixed in r94362. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94303]: New comment added
User Preilly posted a comment on MediaWiki.r94303. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94303#c20684 Commit summary: fix for Bug 29520 - Ability to turn off images on mobile and wap-mobile page views Comment: The vertical alignment issue has been resolved in r94365. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94365]: Revision status changed
User Jack Phoenix changed the status of MediaWiki.r94365. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94365#c0 Commit summary: fix for vertical alignment issue in r94303 ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94361]: New comment added, and revision status changed
User Raymond changed the status of MediaWiki.r94361. Old Status: new New Status: fixme User Raymond also posted a comment on MediaWiki.r94361. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94361#c20685 Commit summary: some i18n msgs. and adding ps- in names. Comment: This is now double prefixed. Either use ps or pageschema. I szggest ps-desc to be consistent. Please change the $egExtensionCredits[] entry accordingly. -'pageschemas-desc' = 'Supports templates defining their data structure via XML markup', +'ps-pageschemas-desc' = 'Supports templates defining their data structure via XML markup', This looks overdone: - 'pageschemas-desc' = '{{desc}}', + 'ps-ps-pageschemas-desc' = '{{desc}}', ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94367]: Revision status changed
User ^demon changed the status of MediaWiki.r94367. Old Status: new New Status: deferred Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94367#c0 Commit summary: Removed explicit inheritance of es.EventEmitter by es.ListBlockList because it already inherits es.Container which inherits es.EventEmitter ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94366]: Revision status changed
User ^demon changed the status of MediaWiki.r94366. Old Status: new New Status: deferred Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94366#c0 Commit summary: Fixed styles and levels for list items ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_. People need a button to create a branch of an article or sets of articles, and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment. 2011/8/12 David Gerard dger...@gmail.com [posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere] THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them. This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again): * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/ I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable. Yes, we should be making ourselves forkable. That way people don't *have* to trust us. We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around. How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so? And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work. - d. ___ foundation-l mailing list foundatio...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Hey, Domas! Firstly, sorry to confuse you with Dario earlier. I am so very bad with names. :) Secondly, thank you for putting together the data we have today. I'm not sure if anyone's mentioned it lately, but it's clearly a really useful thing. I think that's why we're having this conversation now: what's been learned about potential use cases, and how can we make this excellent resource even more valuable? Any tips? :-) My thoughts were that the schema used by the GlobalUsage extension might be reusable here (storing wiki, page namespace ID, page namespace name, and page title). I don't know what GlobalUsage does, but probably it is all wrong ;-) Here's an excerpt form the readme: When using a shared image repository, it is impossible to see within MediaWiki whether a file is used on one of the slave wikis. On Wikimedia this is handled by the CheckUsage tool on the toolserver, but it is merely a hack of function that should be built in. GlobalUsage creates a new table globalimagelinks, which is basically the same as imagelinks, but includes the usage of all images on all associated wikis. The database table itself is about what you'd imagine. It's approximately the metadata we'd need to uniquely identify an article, but it seems to be solving a rather different problem. Uniquely identifying an article is certainly necessary, but I don't think it's the hard part. I'm not sure that Mysql is the place to store this data--it's big and has few dimensions. Since we'd have to make external queries available through an API anyway, why not back it with the right storage engine? [...] projectcounts are aggregated by project, pagecounts are aggregated by page. If you looked at data it should be obvious ;-) And yes, probably best documentation was in some email somewhere. I should've started a decent project with descriptions and support and whatever. Maybe once we move data distribution back into WMF proper, there's no need for it to live nowadays somewhere in Germany. The documentation needed here seems pretty straightforward. Like, a file at http://dammit.lt/wikistats/README that just explains the format of the data, what's included, and what's not. We've covered most of it in this thread already. All that's left is a basic explanation of what each field means in pagecounts/projectcounts. If you tell me these things, I'll even write it. :) But the biggest improvement would be post-processing (cleaning up) the source files. Right now if there are anomalies in the data, every re-user is expected to find and fix these on their own. It's _incredibly_ inefficient for everyone to adjust the data (for encoding strangeness, for bad clients, for data manipulation, for page existence possibly, etc.) rather than having the source files come out cleaner. Raw data is fascinating in that regard though - one can see what are bad clients, what are anomalies, how they encode titles, what are erroneus titles, etc. There're zillions of ways to do post-processing, and none of these will match all needs of every user. Oh, totally! However, I think some uses are more common than others. I bet this covers them: 1. View counts for a subset of existing articles over a range of dates. 2. Sorted/limited aggregate stats (top 100, bottom 50, etc) for a subset of articles and date range. 3. Most popular non-existing (missing) articles for a project. I feel like making those things easier would be awesome, and raw data would still be available for anyone who wants to build something else. I think Domas's dataset is great, and the above should be based on it. Sure, it can be improved in many ways, including more data (some people ask (page,geography) aggregations, though with our long tail that is huge dataset growth ;-) Absolutely. I think it makes sense to start by making the existing data more usable, and then potentially add more to it in the future. I meant that it wouldn't be very difficult to write a script to take the raw data and put it into a public database on the Toolserver (which probably has enough hardware resources for this project currently). I doubt Toolserver has enough resources to have this data thrown at it and queried more, unless you simplify needs a lot. There's 5G raw uncompressed data per day in text form, and long tail makes caching quite painful, unless you go for cache oblivious methods. Yeah. The folks at trendingtopics.org are processing it all on an EC2 Hadoop cluster, and throwing the results in a SQL database. They have a very specific focus, though, so their methods might not be appropriate here. They're an excellent example of someone using the existing dataset in an interesting way, but the fact that they're using EC2 is telling: many people do not have the expertise to handle that sort of thing. I think building an efficiently queryable set of all historic data is unrealistic without a separate cluster.
Re: [Wikitech-l] State of page view stats
I think building an efficiently queryable set of all historic data is unrealistic without a separate cluster. We're talking 100GB/year, before indexing, which is about 400GB if we go back to 2008. [etc] So, these numbers were based on my incorrect assumption that the data I was looking at was daily, but it's actually hourly. So, I guess, multiply everything by 24, and then disregard some of what I said there? -Ian ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94362]: New comment added
User Aaron Schulz posted a comment on MediaWiki.r94362. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94362#c20686 Commit summary: Fix for r94289: we want to skip rows with non-empty sha1, not non-NULL (which is impossible) Comment: This is backwards, fixing in the next commit. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94370]: New comment added, and revision status changed
User Krinkle changed the status of MediaWiki.r94370. Old Status: new New Status: fixme User Krinkle also posted a comment on MediaWiki.r94370. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94370#c20687 Commit summary: * Added LoggedUpdateMaintenance subclass * Moved PopulateRevisionLength/PopulateRevisionSha1 scripts to $postDatabaseUpdateMaintenance * Fixed bogus {$prefix}_sha1 != '' comparison (r94362) * Removed unneeded NOT NULL check (speeds up script a bit) from populateRevisionSha1 script * Various code cleanups Comment: Thanks, finally I can update my wiki again :-) pre Populating ar_sha1 column ...revision table seems to be empty. rev_sha1 and ar_sha1 population complete [28 revision rows, 1 archive rows]. /pre The comment seems wrong though, it hardcodes revision. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] We need to make it easy to fork and leave
John Elliot (2011-08-12 13:36): [...] The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia: http://www.progclub.org/wiki/Network_administration#Links Links such as: http://en.wikipedia.org/wiki/Subversion Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g. http://www.progclub.org/wiki/Wikepedia:Subversion ... That's a very interesting idea... And it should be really hard to do. Let's say you linked the Subversion article and you've set up that the address: http://en.wikipedia.org/wiki/$1 To be hosted as: http://www.progclub.org/wiki/en-wiki:... Now each time your user clicks on a link everything gets registered in your installation as to be downloaded and upon given number of clicks and/or given number of resources and/or at given time to be downloaded to your site. The tricky part would be that you not only need the article itself, but also it's templates and that can be quite a lot with first articles you get. Further more this extension would probably need to allow users to opt-out of downloading images and maybe instead of getting wikicode just host rendered HTML so that you don't really need to host templates. And speaking of images - the problem with any of the solutions is - who would really want to spend money to host all this data? There were times when Wikipedia had many hold ups, but now I feel there are more chances that your own server would choke on the data rather then Wikipedia servers. Maybe ads added to self hosted articles would be worth it, but I kinda doubt anyone would want to host images unless they had to. BTW. I think a dynamic fork was already made by France Telecom. They fork Polish Wikipedia and update articles in a matter of minutes (or at least they did last time I've checked - they even hosted talk pages so it was easy to test). You can see the fork here: http://wikipedia.wp.pl/ Note that they don't host images though they host image pages. Regards, Nux. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of page view stats
Domas Mituzas wrote: Any tips? :-) My thoughts were that the schema used by the GlobalUsage extension might be reusable here (storing wiki, page namespace ID, page namespace name, and page title). I don't know what GlobalUsage does, but probably it is all wrong ;-) GlobalUsage tracks file uses across a wiki family. Its schema is available here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/GlobalUsage/Glob alUsage.sql?view=log. But the biggest improvement would be post-processing (cleaning up) the source files. Right now if there are anomalies in the data, every re-user is expected to find and fix these on their own. It's _incredibly_ inefficient for everyone to adjust the data (for encoding strangeness, for bad clients, for data manipulation, for page existence possibly, etc.) rather than having the source files come out cleaner. Raw data is fascinating in that regard though - one can see what are bad clients, what are anomalies, how they encode titles, what are erroneus titles, etc. There're zillions of ways to do post-processing, and none of these will match all needs of every user. Yes, so providing raw data alongside cleaner data or alongside SQL table dumps (similar to the current dumps for MediaWiki tables) might make more sense here. I'd love to see that all the data is preserved infinitely. It is one of most interesting datasets around, and its value for the future is quite incredible. Nemo has done some work to put the files on Internet Archive, I think. The reality is that a large pile of data that's not easily queryable is directly equivalent to no data at all, for most users. Echoing what I said earlier, it doesn't make much sense for people to be continually forced to reinvent the wheel (post-processing raw data and putting it into a queryable format). I agree. By opening up the dataset I expected others to build upon that and create services. Apparently that doesn't happen. As lots of people use the data, I guess there is need for it, but not enough will to build anything for others to use, so it will end up being created in WMF proper. Building a service where data would be shown on every article is relatively different task from just analytical workload support. For now, building query-able service has been on my todo list, but there were too many initiatives around that suggested that someone else will do that ;-) Yes, beyond Henrik's site, there really isn't much. It would probably help if Wikimedia stopped engaging in so much cookie-licking. That was part of the purpose of this thread: to clarify what Wikimedia is actually planning to invest in this endeavor. Thank you for the detailed replies, Domas. :-) MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94370]: New comment added
User Krinkle posted a comment on MediaWiki.r94370. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94370#c20688 Commit summary: * Added LoggedUpdateMaintenance subclass * Moved PopulateRevisionLength/PopulateRevisionSha1 scripts to $postDatabaseUpdateMaintenance * Fixed bogus {$prefix}_sha1 != '' comparison (r94362) * Removed unneeded NOT NULL check (speeds up script a bit) from populateRevisionSha1 script * Various code cleanups Comment: http://ci.tesla.usability.wikimedia.org/cruisecontrol/buildresults/mw: pre PHP Warning: call_user_func_array() expects parameter 1 to be a valid callback, function 'doPopulateRevSha1' not found or invalid function name in /home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/includes/installer/DatabaseUpdater.php on line 230 /pre ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94376]: New comment added
User Catrope posted a comment on MediaWiki.r94376. Full URL: https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki/94376#c20689 Commit summary: Collection: Use wfExpandUrl() instead of prepending $wgServer everywhere. In one instance, just keep the URL relative. Also clean up global declarations, you can put more than one on the same line. Comment: This is for bug 30184 ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94377]: Revision status changed
User ^demon changed the status of MediaWiki.r94377. Old Status: new New Status: deferred Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94377#c0 Commit summary: * Got transactions and operations working * Added some tests for them ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94368]: Revision status changed
User ^demon changed the status of MediaWiki.r94368. Old Status: new New Status: deferred Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94368#c0 Commit summary: Adjusted lists to be a little more indented ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94381]: Revision status changed
User Aaron Schulz changed the status of MediaWiki.r94381. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94381#c0 Commit summary: Fix bug in jquery.byteLimit.js for Safari 4 * Browsers should ignore the maxLength when the .value is set manually through JavaScript, but for some reason Safari 4 (not 5 and later) is enforcing the limit even when the value property is set from JavaScript. Usually this bug doesn't become visible in this module because the byteLength can't be lower than the number of characters, so we'd never see the bug. However since r94066 we're supporting callbacks, and callbacks could do anything to the calculation, including but not limited to making the string that is being checked shorter (ie. suppose maxLength/byteLimit is 8, value is 'User:Sam', and callback filters like return new mw.Title(val).getName(). If we set it to 'User:Samp' (+p) then Safari 4 would chop the value, because the total string is longer than 8. Whereas all other browsers ignore maxLength (like they should) and let it be and would allow our callback to happen and instead give byteLimit 'Samp' which is length 4 and we still have 4 more characters to go until we reach 8. The fix is easy, simply do not set the maxLength property if there's a callback. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94372]: New comment added, and revision status changed
User Aaron Schulz changed the status of MediaWiki.r94372. Old Status: new New Status: ok User Aaron Schulz also posted a comment on MediaWiki.r94372. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94372#c20690 Commit summary: Instead of using some hacky regexes, just use wfParseUrl() in WikiMap::getDisplayName(). This should make protocol-relative URLs behave correctly as well, and fix bug 29965 Comment: lol ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r91284]: Revision status changed
User Aaron Schulz changed the status of MediaWiki.r91284. Old Status: new New Status: ok Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/91284#c0 Commit summary: * Changed action=revert to use a subclass of Action * Added WikiPage::getActionOverrides() to be able to execute different actions depending on the namespace (obviously needed for action=revert). This is only used when the value of $wgActions for the corresponding action is true; so extension can still override this. * Added Action::getDescription() to ease the change of the page header and the title element ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r91284]: New comment added
User Aaron Schulz posted a comment on MediaWiki.r91284. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/91284#c20691 Commit summary: * Changed action=revert to use a subclass of Action * Added WikiPage::getActionOverrides() to be able to execute different actions depending on the namespace (obviously needed for action=revert). This is only used when the value of $wgActions for the corresponding action is true; so extension can still override this. * Added Action::getDescription() to ease the change of the page header and the title element Comment: It bothers me that action stuff (with is UI level) is in the WikiPage classes, which are supposed to just be DAOs (with a bit of business logic for now). ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[Wikitech-l] State of page view stats
I maintain compacted monthly version of dammit.lt page view stats, starting with Jan 2010 (not an official WMF project). This is to preserve our page views counts for future historians (compare Twitter archive by Library of Congress) It could also be used to resurrect http://wikistics.falsikon.de/latest/wikipedia/en/ which was very popular. Alas the author vanished and does not reply on requests and we don't have the source code. I just applied for storage on dataset1 or ..2, will publish the monthly 2Gb files asap. Each day I download 24 hourly dammit.lt files and compact these into one file. Each month I compact these into monthly file. Major space saving: monthly files with all hourly page views is 8 Gb (compressed), with only articles with 5+ page views per month it is even less than 2 Gb. This is because each page title occurs once instead of up to 24*31 times, and 'bytes sent' field is omitted. All hourly counts are preserved, prefixed by day number and hour number. Here are the first lines of one such file which also describes the format: Erik Zachte (on wikibreak till Sep 12) # Wikimedia article requests (aka page views) for year 2010, month 11 # # Each line contains four fields separated by spaces # - wiki code (subproject.project, see below) # - article title (encoding from original hourly files is preserved to maintain proper sort sequence) # - monthly total (possibly extrapolated from available data when hours/days in input were missing) # - hourly counts (only for hours where indeed article requests occurred) # # Subproject is language code, followed by project code # Project is b:wikibooks, k:wiktionary, n:wikinews, q:wikiquote, s:wikisource, v:wikiversity, z:wikipedia # Note: suffix z added by compression script: project wikipedia happens to be sorted last in dammit.lt files, so add this suffix to fix sort order # # To keep hourly counts compact and tidy both day and hour are coded as one character each, as follows: # Hour 0..23 shown as A..Xconvert to number: ordinal (char) - ordinal ('A') # Day 1..31 shown as A.._ 27=[ 28=\ 29=] 30=^ 31=_ convert to number: ordinal (char) - ordinal ('A') + 1 # # Original data source: Wikimedia full (=unsampled) squid logs # These data have been aggregated from hourly pagecount files at http://dammit.lt/wikistats http://dammit.lt/wikistats, originally produced by Domas Mituzas # Daily and monthly aggregator script built by Erik Zachte # Each day hourly files for previous day are downloaded and merged into one file per day # Each month daily files are merged into one file per month # # This file contains only lines with monthly page request total greater/equal 5 # # Data for all hours of each day were available in input # aa.b File:Broom_icon.svg 6 AV1,IQ1,OT1,QB1,YT1,^K1 aa.b File:Wikimedia.png 7 BO1,BW1,CE1,EV1,LA1,TA1,^A1 aa.b File:Wikipedia-logo-de.png 5 BO1,CE1,EV1,LA1,TA1 aa.b File:Wikiversity-logo.png 7 AB1,BO1,CE1,EV1,LA1,TA1,[C1 aa.b File:Wiktionary-logo-de.png 5 CE1,CM1,EV1,TA1,^N1 aa.b File_talk:Commons-logo.svg 9 CE3,UO3,YE3 aa.b File_talk:Incubator-notext.svg 60 CH3,CL3,DB3,DG3,ET3,FH3,GM3,GO3,IA3,JQ3,KT3,LK3,LL3,MH3,OO3,PF3,XO3,[F3,[O3, ]P3 aa.b MediaWiki:Ipb_cant_unblock 5 BO1,JL1,XX1,[F2 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] State of page view stats
[Resending as plain text] I maintain compacted monthly version of dammit.lt page view stats, starting with Jan 2010 (not an official WMF project). This is to preserve our page views counts for future historians (compare Twitter archive by Library of Congress) It could also be used to resurrect http://wikistics.falsikon.de/latest/wikipedia/en/ which was very popular. Alas the author vanished and does not reply on requests and we dont have the source code. I just applied for storage on dataset1 or ..2, will publish the monthly 2Gb files asap. Each day I download 24 hourly dammit.lt files and compact these into one file. Each month I compact these into monthly file. Major space saving: monthly files with all hourly page views is 8 Gb (compressed), with only articles with 5+ page views per month it is even less than 2 Gb. This is because each page title occurs once instead of up to 24*31 times, and bytes sent field is omitted. All hourly counts are preserved, prefixed by day number and hour number. Here are the first lines of one such file which also describes the format: Erik Zachte (on wikibreak till Sep 12) # Wikimedia article requests (aka page views) for year 2010, month 11 # # Each line contains four fields separated by spaces # - wiki code (subproject.project, see below) # - article title (encoding from original hourly files is preserved to maintain proper sort sequence) # - monthly total (possibly extrapolated from available data when hours/days in input were missing) # - hourly counts (only for hours where indeed article requests occurred) # # Subproject is language code, followed by project code # Project is b:wikibooks, k:wiktionary, n:wikinews, q:wikiquote, s:wikisource, v:wikiversity, z:wikipedia # Note: suffix z added by compression script: project wikipedia happens to be sorted last in dammit.lt files, so add this suffix to fix sort order # # To keep hourly counts compact and tidy both day and hour are coded as one character each, as follows: # Hour 0..23 shown as A..X convert to number: ordinal (char) - ordinal ('A') # Day 1..31 shown as A.._ 27=[ 28=\ 29=] 30=^ 31=_ convert to number: ordinal (char) - ordinal ('A') + 1 # # Original data source: Wikimedia full (=unsampled) squid logs # These data have been aggregated from hourly pagecount files at http://dammit.lt/wikistats, originally produced by Domas Mituzas # Daily and monthly aggregator script built by Erik Zachte # Each day hourly files for previous day are downloaded and merged into one file per day # Each month daily files are merged into one file per month # # This file contains only lines with monthly page request total greater/equal 5 # # Data for all hours of each day were available in input # aa.b File:Broom_icon.svg 6 AV1,IQ1,OT1,QB1,YT1,^K1 aa.b File:Wikimedia.png 7 BO1,BW1,CE1,EV1,LA1,TA1,^A1 aa.b File:Wikipedia-logo-de.png 5 BO1,CE1,EV1,LA1,TA1 aa.b File:Wikiversity-logo.png 7 AB1,BO1,CE1,EV1,LA1,TA1,[C1 aa.b File:Wiktionary-logo-de.png 5 CE1,CM1,EV1,TA1,^N1 aa.b File_talk:Commons-logo.svg 9 CE3,UO3,YE3 aa.b File_talk:Incubator-notext.svg 60 CH3,CL3,DB3,DG3,ET3,FH3,GM3,GO3,IA3,JQ3,KT3,LK3,LL3,MH3,OO3,PF3,XO3,[F3,[O3, ]P3 aa.b MediaWiki:Ipb_cant_unblock 5 BO1,JL1,XX1,[F2 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r93536]: New comment added
User Preilly posted a comment on MediaWiki.r93536. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/93536#c20692 Commit summary: Make Back, Continue translatable. Comment: You can't use wfMsg inside of the output buffer handler. I've fixed this in r94394. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
Re: [Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.
Hola, desafortunadamente entiendo muy poco el idioma ingles a pesar de estudiarlo por varios años, la falta de práctica con el mismo no me ha permitido entender todo lo que me han enviado pero gracias de todas formas. Como informe mi interes es conocer ideas ó tecnicas novedosas de agricultura para apoyar el desarrollo de la zona donde vivo. Saludos Ing. Ilenia López Marzán Especialista de proyectos. - Original Message - From: Sumana Harihareswara suma...@wikimedia.org To: Wikimedia developers wikitech-l@lists.wikimedia.org Sent: Friday, August 12, 2011 12:17 PM Subject: [Wikitech-l] Mark your calendar: MediaWiki hackathon, New Orleans,14-16 Oct. http://www.mediawiki.org/wiki/NOLA_Hackathon MediaWiki developers are going to meet in New Orleans, Louisiana, USA, October 14-16, 2011. Ryan Lane is putting this together and I'm helping a bit. If you're intending to come, please add your name here, just so we can start getting an idea of how many people are coming: http://www.mediawiki.org/wiki/NOLA_Hackathon#Attendees I'll add more details to the wiki page next week. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l __ Información de ESET Smart Security, versión de la base de firmas de virus 3832 (20090206) __ ESET Smart Security ha comprobado este mensaje. http://www.eset.com __ Información de ESET Smart Security, versión de la base de firmas de virus 3832 (20090206) __ ESET Smart Security ha comprobado este mensaje. http://www.eset.com -- Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas Infomed: http://www.sld.cu/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[MediaWiki-CodeReview] [MediaWiki r94230]: New comment added
User Krinkle posted a comment on MediaWiki.r94230. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94230#c20650 Commit summary: Rename mw.uri to mw.Uri + minor fixes: * Renaming mw.uri to mw.Uri (since it's a constructor) * Leaked global variable 'g' in _parse() fixed * Removing unused local variable '_this' in getQueryString() * Fix documentation (jQuery 'setAttr' should be 'attar') * Making non-private variables with '@private' comment, private (or local). * Using strict undefined comparison (shorter and faster, [[JSPERF]]) * Moving Resources definition from MediaWiki Page section to MediaWiki main section (to reflect directory structure) * Coding style conventions (mixed spaces and tabs, line wrapping, double/single quotes) * Remove passing mediaWiki to mw argument (mw is a global alias) * Passes JSHint * Removing 404 errors from UploadWizard/test/jasmine/SpecRunner.html (Follows-up r93781 's move) Comment: Regarding constructors, right now in core we have codemw.Map/code, codemw.Title/code in public and codeUser/code, codeMap/code and codeMessage/code locally, so I just made it fit to that. As for filenames I dont have a strong opinion, I think mediawiki.Title is currently the only one in our javascript tree that is a constructor and has it's own file. I kinda went with the way we did the PHP backend (classname inside filename), but lowerCamelCase makes sense as well. So you mean that codemediaWiki.FooBarBaz = function(){}/code would be defined in codemediawiki.fooBarBaz.js/code ? ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview
[MediaWiki-CodeReview] [MediaWiki r94230]: New comment added
User Krinkle posted a comment on MediaWiki.r94230. Full URL: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94230#c20651 Commit summary: Rename mw.uri to mw.Uri + minor fixes: * Renaming mw.uri to mw.Uri (since it's a constructor) * Leaked global variable 'g' in _parse() fixed * Removing unused local variable '_this' in getQueryString() * Fix documentation (jQuery 'setAttr' should be 'attar') * Making non-private variables with '@private' comment, private (or local). * Using strict undefined comparison (shorter and faster, [[JSPERF]]) * Moving Resources definition from MediaWiki Page section to MediaWiki main section (to reflect directory structure) * Coding style conventions (mixed spaces and tabs, line wrapping, double/single quotes) * Remove passing mediaWiki to mw argument (mw is a global alias) * Passes JSHint * Removing 404 errors from UploadWizard/test/jasmine/SpecRunner.html (Follows-up r93781 's move) Comment: blockquote''We don't have a convention of capitalizing library names''/blockquote Thanks, indeed not for filenames. The javascript constructor functions themselfs are all capitalized though. I've renamed the files in r94325. Testing undefined: I know undefined can be defined, and I'm okay with adding to the conventions that modules using undefined checks should be wrapped in that (just like jQuery does all the time). However since we haven't been doing that until now, I didn't make sense to start doing so in this commit. If we decide to do so, we'll have to change other modules as well. ___ MediaWiki-CodeReview mailing list mediawiki-coderev...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-codereview