Re: [Wikitech-l] Acceptable use of API

2010-09-24 Thread Robert Ullmann
Hi, You don't need the full dumps. Look at (for example) the tr.wp dump that is running at the moment: http://download.wikimedia.org/trwiki/20100924/ you'll see the text dumps and also dumps of various SQL tables. Look at the one that is labelled Wiki interlanguage link records. You ought to

Re: [Wikitech-l] Showing bytes added/removed in each edit in View history and User contributions

2010-08-03 Thread Robert Ullmann
Ahem. The revision size (and page size, meaning that of last revision) in bytes, is available in the API. If you change the definition there is no telling what you will break. Essentially you can't. A character count would have to be another field. best, Robert On Tue, Aug 3, 2010 at 9:53 AM,

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
I've looked at this a bit more. There are more serious problems. Apparently, no-one converted the 5.0 titles in the wiki to 5.1 when normalization was turned on; there are pages that can't be accessed. (!) for example, try this (Malayalam for fish):

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
:26 PM, Platonides platoni...@gmail.com wrote: Robert Ullmann wrote: I've looked at this a bit more. There are more serious problems. Apparently, no-one converted the 5.0 titles in the wiki to 5.1 when normalization was turned on; there are pages that can't be accessed. (!) for example, try

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
PM, Robert Ullmann rlullm...@gmail.com wrote: In December, Praveen Prakesh wrote We are currently using Unicode 5.1 Redirect to unicode 5.0 titled articles in some cases. After converting both these titles become same. Is that a problem? Yes, it is ... cleanupTitles will resolve collisions

Re: [Wikitech-l] Unicode equivalence

2010-05-22 Thread Robert Ullmann
Hi, If you don't still have this thread, the background is that the Malayam projects want to, and are, using Unicode 5.1 for five characters that have composed code points in 5.1, and decomposed in 5.0. The equivalences are: CHILLU NN 0D23, 0D4D, 200D0D7A CHILLU N 0D28, 0D4D,

Re: [Wikitech-l] Wiki links beginning with slash character

2010-03-29 Thread Robert Ullmann
Yes, just leave out the extraneous : you have included for some reason. {{/nav}} will work nicely, if the namespace allows subpages. If you'd like to see an amusing example of / in a namespace without subpages, see: http://en.wiktionary.org/wiki// Robert On Mon, Mar 29, 2010 at 3:05 AM, Jim

Re: [Wikitech-l] importing enwiki into local database

2010-02-18 Thread Robert Ullmann
Hi, There is a note here: http://www.mediawiki.org/wiki/Extension:ParserFunctions saying you should use a different version of ParserFunctions for 1.15.1. I'm not at all sure what that actually means; but the problem definitely seems to be within ParserFunctions ... Robert

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Robert Ullmann
Hi Hippietrail! What do you mean by intractably slow? Just how fast must it be? If I do http://en.wikipedia.org/w/api.php?action=querylist=embeddedineititle=Template:Infobox_Languageeilimit=100einamespace=0 it says (on one given try) that it was served in 0,047 seconds. How long can it take to

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Robert Ullmann
I've been spending hours on the parsing now and don't find it simple at all due to the fact that templates can be nested. Just extracting the Infobox as one big lump is hard due to the need to match nested {{ and }} Andrew Dunbar (hippietrail) Hi, Come now, you are over-thinking it. Find

Re: [Wikitech-l] Software updates Wednesday morning

2009-09-28 Thread Robert Ullmann
Hi, On Wed, Sep 23, 2009 at 8:28 PM, Platonides platoni...@gmail.com wrote: Aryeh Gregor wrote: I've been meaning to investigate this, but haven't found the time yet.  Have you come up with a minimal test case, or filed a bug with Mozilla?  I'd be willing to look at this if I get the time,

Re: [Wikitech-l] Software updates Wednesday morning

2009-09-17 Thread Robert Ullmann
Hi, Something bad happened, having to do with the legend junk add to RC and similar pages. Firefox will go compute bound (or very nearly) as long as the page is open, even if hours. It isn't java/javascript (first suspect ;-), turning them off has no effect. It doesn't quite happen with 50

Re: [Wikitech-l] Speed of parsing messages (was: how to chang {{SITENAME}})

2009-09-16 Thread Robert Ullmann
Like maybe:$1 thingy!-- {{PLURAL}} not needed -- ? does that do it? Robert On Wed, Sep 16, 2009 at 4:32 PM, Roan Kattouw roan.katt...@gmail.com wrote: 2009/9/16 Tisza Gergő gti...@gmail.com: Some of them may be rephrased, and some localizations do not really need them at all. Foe

Re: [Wikitech-l] Wiktionary API acceptable use policy

2009-09-03 Thread Robert Ullmann
Hi, In general a small number of requests is fine, but large numbers (using the wikts as a live back-end database) is not so good. (Note that live mirrors, re-presenting WM data as part of another site are explicitly prohibited. ;-) What you should probably do is use the XML dumps from

Re: [Wikitech-l] HTML not Rendered correctly after Import of Wikipedia

2009-03-15 Thread Robert Ullmann
Hi, On Fri, Mar 13, 2009 at 8:37 PM, O. O. olson...@yahoo.com wrote: Hi,        I attempted to import the English Wikipedia into MediaWiki by first ... The problem that I am now facing is that the HTML Rendered is wrong in places. Mostly this happens at the beginning of the text on the Page.

[Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
I have a bot called Interwicket which keeps itself busy adding and updating the language links for the wiktionaries (namespace 0); it is much more efficient for that than the standard pedia bot. It uses the API, and the maxlag parameter, and is a good citizen, backing off sharply as maxlag exceeds

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
Ah, that makes sense. Also explains why the lag dropped back from 42000 to 5 a few minutes ago ;-) Thanks, Robert A server was taken out of rotation and its slave process stopped to produce a dump for the toolserver. It was put back into rotation before it caught up, and so it was suddenly the

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
now we have server lag at 870 seconds, slowly, but inexorably, increasing (?) Robert On Tue, Mar 3, 2009 at 11:30 AM, Robert Ullmann rlullm...@gmail.com wrote: Ah, that makes sense. Also explains why the lag dropped back from 42000 to 5 a few minutes ago ;-) Thanks, Robert A server was taken

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
now going down slowly, as you would expect. Perhaps normal ;-) On Wed, Mar 4, 2009 at 2:07 AM, Robert Ullmann rlullm...@gmail.com wrote: now we have server lag at 870 seconds, slowly, but inexorably, increasing (?) Robert On Tue, Mar 3, 2009 at 11:30 AM, Robert Ullmann rlullm...@gmail.com

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-23 Thread Robert Ullmann
On Tue, Feb 24, 2009 at 6:49 AM, Andrew Garrett and...@werdn.us wrote: On Tue, Feb 24, 2009 at 1:07 PM, Robert Ullmann rlullm...@gmail.com wrote: Really? I mean is this for real? The sequence ought to be something like: breaker trips, monitor shows within a minute or two that 4 servers

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-23 Thread Robert Ullmann
Let me ask a separate question (Ariel may be interested in this): What if we took the regular permanent media backups, and WMF filtered them in house just to remove the classified stuff (;-), and then put them somewhere where others could convert them to the desired format(s)? (Build all-history

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-22 Thread Robert Ullmann
What is with this? Why are the XML dumps (the primary product of the projects: re-usable content) the absolute effing lowest possible effing priority? Why? I just finished (I thought) putting together some new software to update iwikis on the wiktionaries. It is set up to read the langlinks and

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-22 Thread Robert Ullmann
Hi, Maybe I should offer a constructive suggestion? Clearly, trying to do these dumps (particularly history dumps) as it is being done from the servers is proving hard to manage I also realize that you can't just put the set of daily permanent-media backups on line, as they contain lots of user

[Wikitech-l] API breakage a few hours ago

2009-02-18 Thread Robert Ullmann
The API return in XML for queries with no result elements was broken a few hours ago. I believe the culprit is r46845 http://www.mediawiki.org/wiki/Special:Code/MediaWiki/46845 which does say it is a breaking change, but the break intended is that applications may see query-continue earlier than