Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Andrew Dunbar
On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote:
 2010/10/13 Paul Houle p...@ontology2.com


     Don't be intimidated by working with the data dumps.  If you've got
 an XML API that does streaming processing (I used .NET's XmlReader) and
 use the old unix trick of piping the output of bunzip2 into your
 program,  it's really pretty easy.


 When I worked into it.source (a small dump! something like 300Mby unzipped),
 I used a simple do-it-yourself string python search routine  and I found it
 really faster then python xml routines. I presume that my scripts are really
 too rough to deserve sharing, but I encourage programmers to write a simple
 dump reader using speed of string search. My personal trick was to build an
 index, t.i. a list of pointers to articles and name of articles  into xml
 file, so that it was simple and fast to recover their content. I used it
 mainly because I didn't understand API at all. ;-)

 Alex


Hi Alex. I have been doing something similar in Perl for a few years
for the English
Wiktionary. I've never been sure on the best way to store all the
index files I create
especially in code to share with other people like I would like to
happen. If you'd
like to collaborate or anyone else for that matter it would be pretty cool.

You'll find my stuff on the Toolserver:
https://fisheye.toolserver.org/browse/enwikt

Andrew Dunbar (hippietrail)


-- 
http://wiktionarydev.leuksman.com http://linguaphile.sf.net

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Backups

2010-11-07 Thread Eugene
Hi everyone,

As far as I can tell from the
http://wikitech.wikimedia.org/view/Backup_procedures page, backups are still
not sorted out.. Image backups are not available and even the AWS datasets
haven't been updated as promised (http://aws.amazon.com/datasets/2506).

Glancing at the Wikimedia foundation financial report, the money is there,
staff are being hired and there are lots of projects going on. I wonder what
should be done to give this a higher priority?


Concerned wiki editor,
Eugene

On 4 May 2010 04:35, Ariel T. Glenn ar...@wikimedia.org wrote:

 We should have an offsite (= not in Florida) backup for images shortly.
 At the moment we have replication running every 10 minutes to another
 host in Florida and snapshots at least hourly.

 Ariel

 Στις 02-05-2010, ημέρα Κυρ, και ώρα 21:29 +1000, ο/η Eugene έγραψε:
  Hi,
 
  It's been a year since my initial enquiry and whilst a full enwiki
  dump has been accomplished, there still doesn't seem to be a
  comprehensive backup strategy in place and documented.
 
  The wiki page at http://wikitech.wikimedia.org/view/Backup_procedures
  still shows that commons images are vulnerable.. This is basically a
  reminder for bug
  https://bugzilla.wikimedia.org/show_bug.cgi?id=18255..
 
  Thanks as always for all your work  effort,
  Eugene
 
 
  On 23 April 2009 02:45, Brion Vibber br...@wikimedia.org wrote:
   We'll be doing an audit soon to get our backup list up to date and make
   sure anything that's been stalled or delayed gets back on track.
 
   On 4/19/09 7:10 AM, Eugene wrote:
   Hi everyone,
  
   Are there any updates regarding Wikipedia's backup systems? I've
   created a bug to track this at
   https://bugzilla.wikimedia.org/show_bug.cgi?id=18255.
  
   We'll be doing an audit soon to get our backup list up to date and make
   sure anything that's been stalled or delayed gets back on track.
  
   -- brion
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] HISTORY file and backports

2010-11-07 Thread Platonides
Our release notes process is quite straight forward.
New release notes go to RELEASE-NOTES file, and after
releasing a new major version, they get moved into
HISTORY and a clean RELEASE-NOTES is created.
The linear history is easily preserved this way.

However, there's a kind of changes which doesn't have
an exact addition version. Changes that deserve backporting
are done into trunk, then backported by merging with the
supported releases. The RELEASE-NOTES of those versions are
updated in a Changes since xyz heading.

However, it is not clear where that release notes should go
on the HISTORY file. Specially since it is no longer linear.
Currently, we could have a revision go into 1.15.6, 1.16.1
and 1.17.0. At which point should it be placed?

Old versions (up to 1.5.x) do have entries for updates in HISTORY
file, and also 1.13.x, but that's it.

We don't seem to be maintaining it, as brought up in r72587.

What should be the procedure when backporting?
* When merging to an earlier release, the RELEASE-NOTES go there,
and is added into trunk HISTORY file at the same time.

* When merging to an earlier release, the RELEASE-NOTES go there.
On release, the new section is added as a whole on the trunk HISTORY.

* We don't keep HISTORY for release updates. Changes get an entry inside
the branch RELEASE-NOTES, but also inside the trunk one. It is shipped
as a fix on the next major, even if it was also fixed in some minor
in-between.

The last one is the lazier one, and completely avoids the
on-which-version issues.
However, it can't cope with problems that are only fixed in the branch
(eg. a minor
fix for a feature that was completely rewritten in trunk). It also
delusions users
as giving fixes that they (should) already have.

I would prefer one of the other two, and perhaps give a common entry on
HISTORY
Bug fixes in 1.15.x and 1.16.y.

Opinions?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Alex Brollo
2010/11/7 Andrew Dunbar hippytr...@gmail.com

 On 14 October 2010 09:37, Alex Brollo alex.bro...@gmail.com wrote:

 Hi Alex. I have been doing something similar in Perl for a few years
 for the English
 Wiktionary. I've never been sure on the best way to store all the
 index files I create
 especially in code to share with other people like I would like to
 happen. If you'd
 like to collaborate or anyone else for that matter it would be pretty cool.

 You'll find my stuff on the Toolserver:
 https://fisheye.toolserver.org/browse/enwikt

 Thanks Andrew. I just got a toolserver account, but don't seach for any
contribution by me... I'm very worried about the whole stuff and needed
skills. :-(

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] ResourceLoader, now in trunk!

2010-11-07 Thread Dmitriy Sintsov
Hi!
I would like to thank Roan Kattouw for helping me to adapt my 
Extension:WikiSync to use ResourceLoader. Before asking, I've spend some 
hours trying to get it to work, unsuccessfully. One of key errors was 
not so easy to guess: it turned out that browsers violate self-closing 
xml tags sometimes (a tag without nested text node - with slash inside), 
and it was such case with iframe. I probably even have heard about this 
few years ago, but already forgot it. While in Monobook and addScripts() 
head inclusion that was not critical (the extension worked), with 
bottom-inclusion it's much more critical. (I've observed non-closed tags 
in some old extensions, sometimes). The real nastiness is, that my 
extension used arrays to represent html /xml and the entries are 
auto-closed, so the tree should be valid. I wasn't even thinking of 
possibility of non-closed tags (invalid tree) in my case.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] HISTORY file and backports

2010-11-07 Thread Tim Starling
On 08/11/10 04:28, Platonides wrote:
 What should be the procedure when backporting?
 * When merging to an earlier release, the RELEASE-NOTES go there,
 and is added into trunk HISTORY file at the same time.

It's hard enough to get people to update one file.

 * When merging to an earlier release, the RELEASE-NOTES go there.
 On release, the new section is added as a whole on the trunk HISTORY.

This is more or less what I do at the moment. When I create a new
major version, I update the HISTORY file in the new branch, copying in
changes from the RELEASE-NOTES in the old branch.

If we have a 1.16.x release sequence and a 1.15.x release sequence,
then changes that are backported to 1.15.x are usually also backported
to 1.16.x, so the RELEASE-NOTES entry will be in both of them. So the
HISTORY file in the 1.16.x branch contains changes made to the 1.15.x
branch before the branch point, and the RELEASE-NOTES file contains
changes made after it.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] ResourceLoader, now in trunk!

2010-11-07 Thread Aryeh Gregor
On Sun, Nov 7, 2010 at 2:34 PM, Dmitriy Sintsov ques...@rambler.ru wrote:
 I would like to thank Roan Kattouw for helping me to adapt my
 Extension:WikiSync to use ResourceLoader. Before asking, I've spend some
 hours trying to get it to work, unsuccessfully. One of key errors was
 not so easy to guess: it turned out that browsers violate self-closing
 xml tags sometimes (a tag without nested text node - with slash inside),
 and it was such case with iframe.

This is why the / syntax in text/html is stupid and harmful -- it's
just ignored by all browsers, but people assume it works.  foo and
foo / parse exactly the same way in text/html and always have, the /
is ignored entirely.  Elements that can never contain anything (like
img) don't need a closing tag or closing slash or anything, and
other elements (including iframe) must be explicitly closed with a
separate closing tag.

Of course, since HTML5 parsers are still much harder to dig up than
XML parsers, we still should probably continue outputting well-formed
XML, which means including these trailing slashes that are pointless
for browsers.  Just a random little rant here.  :)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] HISTORY file and backports

2010-11-07 Thread Platonides
Tim Starling wrote:
 On 08/11/10 04:28, Platonides wrote:
 * When merging to an earlier release, the RELEASE-NOTES go there.
 On release, the new section is added as a whole on the trunk HISTORY.
 
 This is more or less what I do at the moment. When I create a new
 major version, I update the HISTORY file in the new branch, copying in
 changes from the RELEASE-NOTES in the old branch.
 
 If we have a 1.16.x release sequence and a 1.15.x release sequence,
 then changes that are backported to 1.15.x are usually also backported
 to 1.16.x, so the RELEASE-NOTES entry will be in both of them. So the
 HISTORY file in the 1.16.x branch contains changes made to the 1.15.x
 branch before the branch point, and the RELEASE-NOTES file contains
 changes made after it.
 
 -- Tim Starling

The question was in the line of what to do with *trunk* HISTORY, for
which two older releases would include the fix. Thus the idea of a Bug
fixes in 1.15.6 and 1.16.1 headline.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] HISTORY file and backports

2010-11-07 Thread Tim Starling
On 08/11/10 09:58, Platonides wrote:
 The question was in the line of what to do with *trunk* HISTORY, for
 which two older releases would include the fix. Thus the idea of a Bug
 fixes in 1.15.6 and 1.16.1 headline.

The RELEASE-NOTES for 1.16.x will be copied into trunk immediately
before the branch point for 1.17.x. So after the branch, such a change
will have a HISTORY entry for 1.16.1 and not 1.15.6.

I don't think it's worthwhile updating the HISTORY in trunk in between
major releases, since the HISTORY file is intended for users of
tarball releases.

Sometimes people do try to add things to the HISTORY file in between
branch points, resulting in it containing a random selection of
backported changes. Rather than pick through it and try to work out
what's missing, I found it easier to delete the most recent sections
and re-add them by copying from the branch RELEASE-NOTES.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l