Re: [Wikitech-l] [mwdumper] new maintainer?
Le mar 16/02/10 14:13, Jamie Morken jmor...@shaw.ca a écrit: What is the benefit of the database dumps being archived/distributed in xml format instead of sql format? Converting the xml to sql takes a long time for big wiki's and people seem to have problems with this step, so why isn't the sql format available for download instead of the xml format? * Your are DB neutral, so you do not need to have a version for mysql, for postgres... * You may apply filter easily * The XML is still usefull after a DB schema upgrade Emmanuel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
Hi! Converting the xml to sql takes a long time for big wiki's and people seem to have problems with this step, so why isn't the sql format available for download instead of the xml format? Our dumps are not 'sql dumps'. We assemble them from all the different parts (memcached, multiple database instances) - so it doesn't really make sense to output in some specific format, as one needs a decent loading routine to load that data into various kinds of stores. Currently only widely available and supported marshalling format is XML, so hence we use it. Do note, that if you use SAX-based conversion, conversion to SQL is way more efficient than database ability to load that data, especially if you attempt to maintain all the crossindexes on the fly too. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
On 2/16/10 7:03 AM, Jamie Morken wrote: Ok, the simple question: how many people prefer XML or sql dumps? I think we have a FAQ on this... http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F You *do* realize that such SQL dumps would have to be invented from whole cloth and couldn't just be dumped from the actual databases, right? The raw databases include dozens of alternate clusters and have data from different revisions compressed together, including deleted items and private data, and can't simply be released by WMF even if someone actually wanted to figure out how to replicate Wikimedia's exact storage cluster layout to do a data import. Most likely if they were created they'd simply be created by running the xml through a tool like mwdumper... -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
On 2/11/10 7:41 AM, emman...@engelhart.org wrote: Almost one month ago I have reported a bug in mwdumper which seems to me to be critical. I simply can't user mwdumper with the itwiki XML dumps: https://bugzilla.wikimedia.org/show_bug.cgi?id=22137 I have extract the problematic part of the XML but until now I do not have had any remark about this bug report and I guess that Brion, the bugzilla maintainer for mwdumper, does not have time anymore for that. I'm behind on mailing lists and bugzilla monitoring, but don't be afraid to ping me direct to take a quick peek at things, especially if they seem critical. I'm not always available but I'll see what I can do. mwdumper is an almost mandatory tool to spread our content and for this reason i wanted to speak about that on the ML. Maybe someone with java skills is interested to help me resolving this mwdumper bug? Looks like y'all already worked out that this is a GCJ library bug; the sample code works fine with other JVMs tested so far. Use OpenJDK or another JVM until they get that fixed upstream. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
Date: Tue, 16 Feb 2010 09:34:41 -0800 From: Brion Vibber br...@pobox.com Subject: Re: [Wikitech-l] [mwdumper] new maintainer? To: wikitech-l@lists.wikimedia.org Message-ID: hlekvf$nl...@ger.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 2/16/10 7:03 AM, Jamie Morken wrote: Ok, the simple question: how many people prefer XML or sql dumps? I think we have a FAQ on this... http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F You *do* realize that such SQL dumps would have to be invented from whole cloth and couldn't just be dumped from the actual databases, right? The raw databases include dozens of alternate clusters and have data from different revisions compressed together, including deleted items and private data, and can't simply be released by WMF even if someone actually wanted to figure out how to replicate Wikimedia's exact storage cluster layout to do a data import. Most likely if they were created they'd simply be created by running the xml through a tool like mwdumper... -- brion Hi Brion, I have not tried mwdumper yet, I have been looking at the various xml to sql conversion tools, and reading about people's use of them, but I will have to give it a try to see for myself, but it seems like an overly complex task to recreate an sql database in my opinion. Also when wikimedia dumps used to be in sql format I think there were less dump problems than there are now, although maybe the main issue is the growth of the file sizes. It is probably simpler to make an sql dump than an XML dump I bet, also the older mediawiki dumps were in sql format. For making the wikimedia dumps into sql directly I think the process would be to do sql database merge's and then make sure the private data is erased? This might be simpler than creating to XML and then using mwdumper to get back to sql. Also there is a bottleneck somewhere in the dump system (dump fails etc) maybe it is the XML part? I will get back to you after I try mwdumper and/or: php importDump.php 17gigabytefail :) cheers, Jamie ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
Le ven 12/02/10 01:52, Tim Starling tstarl...@wikimedia.org a écrit: emmanu e...@engelhart.org wrote: mwdumper is an almost mandatory tool to spread our content and for this reason i wanted to speak about that on the ML. You might have to be more specific than that. It doesn't seem like a mandatory tool to me. XML dumps+mwdumper are the only one solution I know to make quality static dumps? Do you know an other one? Emmanuel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
emman...@engelhart.org said: Le ven 12/02/10 01:52, Tim Starling tstarl...@wikimedia.org a écrit: emman...@engelhart.org wrote: mwdumper is an almost mandatory tool to spread our content and for this reason i wanted to speak about that on the ML. You might have to be more specific than that. It doesn't seem like a mandatory tool to me. XML dumps+mwdumper are the only one solution I know to make quality static dumps? Do you know an other one? Emmanuel ___ Hi Emmanuel, We use the DumpHTML extension (http://www.mediawiki.org/wiki/Extension:DumpHTML) to make static copies of our wikis. It used to be a maintenance script. Maybe that would work for you? -Courtney ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
Le ven 12/02/10 14:24, Christensen, Courtney christens...@battelle.org a écrit: We use the DumpHTML extension (http://www.mediawiki.org/wiki/Extension:DumpHTML) to make static copies of our wikis. It used to be a maintenance script. Maybe that would work for you? The DumpHTML extension is something else... this is tool a to get a static HTML version of Mediawiki articles. If you speak from http://static.wikipedia.org/... this is also an other topic because these pages are not our content, but only a not customizable view of our content (I can't do nothing with it). Our content is the wiki code and the files (images, etc.) ... and this is what seems not to be fully reusable currently. Emmanuel PS: DumpHTML seems also not to be maintened currently... have a look to the bug reports. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
mwDumper is essential also for anyone wiling to replicate a wiki locally for any purpose. There are alternatives such as xml2SQL or importDump.php but mwDumper is the most efficient in terms of correctness and completeness or speed sometimes. bilal == Verily, with hardship comes ease. On Fri, Feb 12, 2010 at 8:46 AM, emman...@engelhart.org wrote: Le ven 12/02/10 14:24, Christensen, Courtney christens...@battelle.orga écrit: We use the DumpHTML extension ( http://www.mediawiki.org/wiki/Extension:DumpHTML) to make static copies of our wikis. It used to be a maintenance script. Maybe that would work for you? The DumpHTML extension is something else... this is tool a to get a static HTML version of Mediawiki articles. If you speak from http://static.wikipedia.org/... this is also an other topic because these pages are not our content, but only a not customizable view of our content (I can't do nothing with it). Our content is the wiki code and the files (images, etc.) ... and this is what seems not to be fully reusable currently. Emmanuel PS: DumpHTML seems also not to be maintened currently... have a look to the bug reports. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] [mwdumper] new maintainer?
Hi Almost one month ago I have reported a bug in mwdumper which seems to me to be critical. I simply can't user mwdumper with the itwiki XML dumps: https://bugzilla.wikimedia.org/show_bug.cgi?id=22137 I have extract the problematic part of the XML but until now I do not have had any remark about this bug report and I guess that Brion, the bugzilla maintainer for mwdumper, does not have time anymore for that. mwdumper is an almost mandatory tool to spread our content and for this reason i wanted to speak about that on the ML. Maybe someone with java skills is interested to help me resolving this mwdumper bug? Regards Emmanuel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [mwdumper] new maintainer?
emman...@engelhart.org wrote: mwdumper is an almost mandatory tool to spread our content and for this reason i wanted to speak about that on the ML. You might have to be more specific than that. It doesn't seem like a mandatory tool to me. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l