Re: [Wikitech-l] API vs data dumps
On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote: 2010/10/13 Paul Houle p...@ontology2.com Don't be intimidated by working with the data dumps. If you've got an XML API that does streaming processing (I used .NET's XmlReader) and use the old unix trick of piping the output of bunzip2 into your program, it's really pretty easy. When I worked into it.source (a small dump! something like 300Mby unzipped), I used a simple do-it-yourself string python search routine and I found it really faster then python xml routines. I presume that my scripts are really too rough to deserve sharing, but I encourage programmers to write a simple dump reader using speed of string search. My personal trick was to build an index, t.i. a list of pointers to articles and name of articles into xml file, so that it was simple and fast to recover their content. I used it mainly because I didn't understand API at all. ;-) Alex Hi Alex. I have been doing something similar in Perl for a few years for the English Wiktionary. I've never been sure on the best way to store all the index files I create especially in code to share with other people like I would like to happen. If you'd like to collaborate or anyone else for that matter it would be pretty cool. You'll find my stuff on the Toolserver: https://fisheye.toolserver.org/browse/enwikt Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Backups
Hi everyone, As far as I can tell from the http://wikitech.wikimedia.org/view/Backup_procedures page, backups are still not sorted out.. Image backups are not available and even the AWS datasets haven't been updated as promised (http://aws.amazon.com/datasets/2506). Glancing at the Wikimedia foundation financial report, the money is there, staff are being hired and there are lots of projects going on. I wonder what should be done to give this a higher priority? Concerned wiki editor, Eugene On 4 May 2010 04:35, Ariel T. Glenn ar...@wikimedia.org wrote: We should have an offsite (= not in Florida) backup for images shortly. At the moment we have replication running every 10 minutes to another host in Florida and snapshots at least hourly. Ariel Στις 02-05-2010, ημέρα Κυρ, και ώρα 21:29 +1000, ο/η Eugene έγραψε: Hi, It's been a year since my initial enquiry and whilst a full enwiki dump has been accomplished, there still doesn't seem to be a comprehensive backup strategy in place and documented. The wiki page at http://wikitech.wikimedia.org/view/Backup_procedures still shows that commons images are vulnerable.. This is basically a reminder for bug https://bugzilla.wikimedia.org/show_bug.cgi?id=18255.. Thanks as always for all your work effort, Eugene On 23 April 2009 02:45, Brion Vibber br...@wikimedia.org wrote: We'll be doing an audit soon to get our backup list up to date and make sure anything that's been stalled or delayed gets back on track. On 4/19/09 7:10 AM, Eugene wrote: Hi everyone, Are there any updates regarding Wikipedia's backup systems? I've created a bug to track this at https://bugzilla.wikimedia.org/show_bug.cgi?id=18255. We'll be doing an audit soon to get our backup list up to date and make sure anything that's been stalled or delayed gets back on track. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] HISTORY file and backports
Our release notes process is quite straight forward. New release notes go to RELEASE-NOTES file, and after releasing a new major version, they get moved into HISTORY and a clean RELEASE-NOTES is created. The linear history is easily preserved this way. However, there's a kind of changes which doesn't have an exact addition version. Changes that deserve backporting are done into trunk, then backported by merging with the supported releases. The RELEASE-NOTES of those versions are updated in a Changes since xyz heading. However, it is not clear where that release notes should go on the HISTORY file. Specially since it is no longer linear. Currently, we could have a revision go into 1.15.6, 1.16.1 and 1.17.0. At which point should it be placed? Old versions (up to 1.5.x) do have entries for updates in HISTORY file, and also 1.13.x, but that's it. We don't seem to be maintaining it, as brought up in r72587. What should be the procedure when backporting? * When merging to an earlier release, the RELEASE-NOTES go there, and is added into trunk HISTORY file at the same time. * When merging to an earlier release, the RELEASE-NOTES go there. On release, the new section is added as a whole on the trunk HISTORY. * We don't keep HISTORY for release updates. Changes get an entry inside the branch RELEASE-NOTES, but also inside the trunk one. It is shipped as a fix on the next major, even if it was also fixed in some minor in-between. The last one is the lazier one, and completely avoids the on-which-version issues. However, it can't cope with problems that are only fixed in the branch (eg. a minor fix for a feature that was completely rewritten in trunk). It also delusions users as giving fixes that they (should) already have. I would prefer one of the other two, and perhaps give a common entry on HISTORY Bug fixes in 1.15.x and 1.16.y. Opinions? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API vs data dumps
2010/11/7 Andrew Dunbar hippytr...@gmail.com On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote: Hi Alex. I have been doing something similar in Perl for a few years for the English Wiktionary. I've never been sure on the best way to store all the index files I create especially in code to share with other people like I would like to happen. If you'd like to collaborate or anyone else for that matter it would be pretty cool. You'll find my stuff on the Toolserver: https://fisheye.toolserver.org/browse/enwikt Thanks Andrew. I just got a toolserver account, but don't seach for any contribution by me... I'm very worried about the whole stuff and needed skills. :-( Alex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
Hi! I would like to thank Roan Kattouw for helping me to adapt my Extension:WikiSync to use ResourceLoader. Before asking, I've spend some hours trying to get it to work, unsuccessfully. One of key errors was not so easy to guess: it turned out that browsers violate self-closing xml tags sometimes (a tag without nested text node - with slash inside), and it was such case with iframe. I probably even have heard about this few years ago, but already forgot it. While in Monobook and addScripts() head inclusion that was not critical (the extension worked), with bottom-inclusion it's much more critical. (I've observed non-closed tags in some old extensions, sometimes). The real nastiness is, that my extension used arrays to represent html /xml and the entries are auto-closed, so the tree should be valid. I wasn't even thinking of possibility of non-closed tags (invalid tree) in my case. Dmitriy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HISTORY file and backports
On 08/11/10 04:28, Platonides wrote: What should be the procedure when backporting? * When merging to an earlier release, the RELEASE-NOTES go there, and is added into trunk HISTORY file at the same time. It's hard enough to get people to update one file. * When merging to an earlier release, the RELEASE-NOTES go there. On release, the new section is added as a whole on the trunk HISTORY. This is more or less what I do at the moment. When I create a new major version, I update the HISTORY file in the new branch, copying in changes from the RELEASE-NOTES in the old branch. If we have a 1.16.x release sequence and a 1.15.x release sequence, then changes that are backported to 1.15.x are usually also backported to 1.16.x, so the RELEASE-NOTES entry will be in both of them. So the HISTORY file in the 1.16.x branch contains changes made to the 1.15.x branch before the branch point, and the RELEASE-NOTES file contains changes made after it. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ResourceLoader, now in trunk!
On Sun, Nov 7, 2010 at 2:34 PM, Dmitriy Sintsov ques...@rambler.ru wrote: I would like to thank Roan Kattouw for helping me to adapt my Extension:WikiSync to use ResourceLoader. Before asking, I've spend some hours trying to get it to work, unsuccessfully. One of key errors was not so easy to guess: it turned out that browsers violate self-closing xml tags sometimes (a tag without nested text node - with slash inside), and it was such case with iframe. This is why the / syntax in text/html is stupid and harmful -- it's just ignored by all browsers, but people assume it works. foo and foo / parse exactly the same way in text/html and always have, the / is ignored entirely. Elements that can never contain anything (like img) don't need a closing tag or closing slash or anything, and other elements (including iframe) must be explicitly closed with a separate closing tag. Of course, since HTML5 parsers are still much harder to dig up than XML parsers, we still should probably continue outputting well-formed XML, which means including these trailing slashes that are pointless for browsers. Just a random little rant here. :) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HISTORY file and backports
Tim Starling wrote: On 08/11/10 04:28, Platonides wrote: * When merging to an earlier release, the RELEASE-NOTES go there. On release, the new section is added as a whole on the trunk HISTORY. This is more or less what I do at the moment. When I create a new major version, I update the HISTORY file in the new branch, copying in changes from the RELEASE-NOTES in the old branch. If we have a 1.16.x release sequence and a 1.15.x release sequence, then changes that are backported to 1.15.x are usually also backported to 1.16.x, so the RELEASE-NOTES entry will be in both of them. So the HISTORY file in the 1.16.x branch contains changes made to the 1.15.x branch before the branch point, and the RELEASE-NOTES file contains changes made after it. -- Tim Starling The question was in the line of what to do with *trunk* HISTORY, for which two older releases would include the fix. Thus the idea of a Bug fixes in 1.15.6 and 1.16.1 headline. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HISTORY file and backports
On 08/11/10 09:58, Platonides wrote: The question was in the line of what to do with *trunk* HISTORY, for which two older releases would include the fix. Thus the idea of a Bug fixes in 1.15.6 and 1.16.1 headline. The RELEASE-NOTES for 1.16.x will be copied into trunk immediately before the branch point for 1.17.x. So after the branch, such a change will have a HISTORY entry for 1.16.1 and not 1.15.6. I don't think it's worthwhile updating the HISTORY in trunk in between major releases, since the HISTORY file is intended for users of tarball releases. Sometimes people do try to add things to the HISTORY file in between branch points, resulting in it containing a random selection of backported changes. Rather than pick through it and try to work out what's missing, I found it easier to delete the most recent sections and re-add them by copying from the branch RELEASE-NOTES. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l