Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread emmanuel
 Le mar 16/02/10 14:13, Jamie Morken jmor...@shaw.ca a écrit:
 What is the benefit of the database dumps being archived/distributed in xml
 format instead of sql format?  Converting the xml to sql takes a long time
 for big wiki's and people seem to have problems with this step, so why
 isn't the sql format available for download instead of the xml format?

* Your are DB neutral, so you do not need to have a version for mysql, for 
postgres...
* You may apply filter easily
* The XML is still usefull after a DB schema upgrade

Emmanuel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Domas Mituzas
Hi!

 Converting the xml to sql takes a long time for big wiki's and people seem to 
 have problems with this step, so why isn't the sql format available for 
 download instead of the xml format?

Our dumps are not 'sql dumps'. We assemble them from all the different parts 
(memcached, multiple database instances) - so it doesn't really make sense to 
output in some specific format, as one needs a decent loading routine to load 
that data into various kinds of stores. 

Currently only widely available and supported marshalling format is XML, so 
hence we use it.

Do note, that if you use SAX-based conversion, conversion to SQL is way more 
efficient than database ability to load that data, especially if you attempt to 
maintain all the crossindexes on the fly too.

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Brion Vibber
On 2/16/10 7:03 AM, Jamie Morken wrote:
 Ok, the simple question: how many people prefer XML or sql dumps?

I think we have a FAQ on this...

http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F


You *do* realize that such SQL dumps would have to be invented from 
whole cloth and couldn't just be dumped from the actual databases, right?

The raw databases include dozens of alternate clusters and have data 
from different revisions compressed together, including deleted items 
and private data, and can't simply be released by WMF even if someone 
actually wanted to figure out how to replicate Wikimedia's exact storage 
cluster layout to do a data import.

Most likely if they were created they'd simply be created by running the 
xml through a tool like mwdumper...

-- brion


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Brion Vibber
On 2/11/10 7:41 AM, emman...@engelhart.org wrote:
 Almost one month ago I have reported a bug in mwdumper which seems to me to 
 be critical.
 I simply can't user mwdumper with the itwiki XML dumps:
 https://bugzilla.wikimedia.org/show_bug.cgi?id=22137

 I have extract the problematic part of the XML but until now I do not have 
 had any remark
   about this bug report and I guess that Brion, the bugzilla maintainer for 
 mwdumper,
 does not have time anymore for that.

I'm behind on mailing lists and bugzilla monitoring, but don't be afraid 
to ping me direct to take a quick peek at things, especially if they 
seem critical.

I'm not always available but I'll see what I can do.

 mwdumper is an almost mandatory tool to spread our content and for this 
 reason i wanted to
 speak about that on the ML.

 Maybe someone with java skills is interested to help me resolving this 
 mwdumper bug?

Looks like y'all already worked out that this is a GCJ library bug; the 
sample code works fine with other JVMs tested so far. Use OpenJDK or 
another JVM until they get that fixed upstream.

-- brion


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Jamie Morken


Date: Tue, 16 Feb 2010 09:34:41 -0800
From: Brion Vibber br...@pobox.com
Subject: Re: [Wikitech-l] [mwdumper] new maintainer?
To: wikitech-l@lists.wikimedia.org
Message-ID: hlekvf$nl...@ger.gmane.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 2/16/10 7:03 AM, Jamie Morken wrote:
 Ok, the simple question: how many people prefer XML or sql dumps?

I think we have a FAQ on this...

http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F


You *do* realize that such SQL dumps would have to be invented from 
whole cloth and couldn't just be dumped from the actual databases, right?

The raw databases include dozens of alternate clusters and have data 
from different revisions compressed together, including deleted items 
and private data, and can't simply be released by WMF even if someone 
actually wanted to figure out how to replicate Wikimedia's exact storage 
cluster layout to do a data import.

Most likely if they were created they'd simply be created by running the 
xml through a tool like mwdumper...

-- brion



Hi Brion,

I have not tried mwdumper yet, I have been looking at the various xml to sql 
conversion tools, and reading about people's use of them, but I will have to 
give it a try to see for myself, but it seems like an overly complex task to 
recreate an sql database in my opinion.  Also when wikimedia dumps used to be 
in sql format I think there were less dump problems than there are now, 
although maybe the main issue is the growth of the file sizes.  It is probably 
simpler to make an sql dump than an XML dump I bet, also the older mediawiki 
dumps were in sql format.  For making the wikimedia dumps into sql directly I 
think the process would be to do sql database merge's and then make sure the 
private data is erased?  This might be simpler than creating to XML and then 
using mwdumper to get back to sql.  Also there is a bottleneck somewhere in the 
dump system (dump fails etc) maybe it is the XML part?  I will get back to you 
after I try mwdumper and/or:

php importDump.php 17gigabytefail :)

cheers,
Jamie


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-12 Thread emmanuel
 Le ven 12/02/10 01:52, Tim Starling tstarl...@wikimedia.org a écrit:
 emmanu
 e...@engelhart.org wrote:
 mwdumper is an almost mandatory tool to spread our
 content and for this reason i wanted to 
 speak about that on the ML.
 
 You might have to be more specific than that. It doesn't seem like a
 mandatory tool to me.

XML dumps+mwdumper are the only one solution I know to make quality static 
dumps?
Do you know an other one?

Emmanuel 


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-12 Thread Christensen, Courtney

emman...@engelhart.org said:
 Le ven 12/02/10 01:52, Tim Starling tstarl...@wikimedia.org a écrit:
 emman...@engelhart.org wrote:
 mwdumper is an almost mandatory tool to spread our
 content and for this reason i wanted to 
 speak about that on the ML.
 
 You might have to be more specific than that. It doesn't seem like a
 mandatory tool to me.

XML dumps+mwdumper are the only one solution I know to make quality static 
dumps?
Do you know an other one?

Emmanuel 
___


Hi Emmanuel, 

We use the DumpHTML extension 
(http://www.mediawiki.org/wiki/Extension:DumpHTML) to make static copies of our 
wikis.  It used to be a maintenance script.

Maybe that would work for you?
-Courtney

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-12 Thread emmanuel
 Le ven 12/02/10 14:24, Christensen, Courtney christens...@battelle.org a 
écrit:
 We use the DumpHTML extension 
 (http://www.mediawiki.org/wiki/Extension:DumpHTML) to
 make static copies of our wikis.  It used to be a maintenance script.
 Maybe that would work for you?

The DumpHTML extension is something else... this is tool a to get a static HTML 
version of Mediawiki articles.

If you speak from http://static.wikipedia.org/... this is also an other topic 
because these pages are not our content, but only a not customizable view of 
our content (I can't do nothing with it).

Our content is the wiki code and the files (images, etc.) ... and this is what 
seems not to be fully reusable currently.

Emmanuel

PS: DumpHTML seems also not to be maintened currently... have a look to the bug 
reports.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-12 Thread Bilal Abdul Kader
mwDumper is essential also for anyone wiling to replicate a wiki locally for
any purpose. There are alternatives such as xml2SQL or importDump.php but
mwDumper is the most efficient in terms of correctness and completeness or
speed sometimes.

bilal
==
Verily, with hardship comes ease.


On Fri, Feb 12, 2010 at 8:46 AM, emman...@engelhart.org wrote:

  Le ven 12/02/10 14:24, Christensen, Courtney christens...@battelle.orga 
 écrit:
  We use the DumpHTML extension (
 http://www.mediawiki.org/wiki/Extension:DumpHTML) to
  make static copies of our wikis.  It used to be a maintenance script.
  Maybe that would work for you?

 The DumpHTML extension is something else... this is tool a to get a static
 HTML version of Mediawiki articles.

 If you speak from http://static.wikipedia.org/... this is also an other
 topic because these pages are not our content, but only a not customizable
 view of our content (I can't do nothing with it).

 Our content is the wiki code and the files (images, etc.) ... and this is
 what seems not to be fully reusable currently.

 Emmanuel

 PS: DumpHTML seems also not to be maintened currently... have a look to the
 bug reports.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] [mwdumper] new maintainer?

2010-02-11 Thread emmanuel
Hi

Almost one month ago I have reported a bug in mwdumper which seems to me to be 
critical.
I simply can't user mwdumper with the itwiki XML dumps:
https://bugzilla.wikimedia.org/show_bug.cgi?id=22137

I have extract the problematic part of the XML but until now I do not have had 
any remark
 about this bug report and I guess that Brion, the bugzilla maintainer for 
mwdumper, 
does not have time anymore for that.

mwdumper is an almost mandatory tool to spread our content and for this reason 
i wanted to 
speak about that on the ML.

Maybe someone with java skills is interested to help me resolving this mwdumper 
bug?

Regards
Emmanuel




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-11 Thread Tim Starling
emman...@engelhart.org wrote:
 mwdumper is an almost mandatory tool to spread our content and for this 
 reason i wanted to 
 speak about that on the ML.

You might have to be more specific than that. It doesn't seem like a
mandatory tool to me.

-- Tim Starling




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l