Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-12 Thread O. O.
Andrew Krizhanovsky wrote:
 Hi!
 
 I have tried xml2sql and importDump.php.
 The same error.
 
 Best regards,
 Andrew.
 

Thanks Bilal and Andrew.
O.O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-10 Thread Andrew Krizhanovsky
Hi!

I have tried xml2sql and importDump.php.
The same error.

Best regards,
Andrew.

On Fri, Oct 9, 2009 at 10:18 PM, O. O. olson...@yahoo.com wrote:
 Andrew Krizhanovsky wrote:
 Hi!

 I have got the same redirect problem while importing the dump of
 Russian Wiktionary. :(

 Best regards,
 Andrew Krizhanovsky.


 So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am
 curious to know what others are using  for their imports. (This is for
 my personal knowledge.)

 It seems that the “redirect /” tags are mostly blank while grepping
 through the English Wikipedia Dump. I hope someone can fix this soon.

 Thanks to you guys,
 O. O.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-09 Thread Andrew Krizhanovsky
Hi!

I have got the same redirect problem while importing the dump of
Russian Wiktionary. :(

Best regards,
Andrew Krizhanovsky.

On Fri, Oct 9, 2009 at 3:46 AM, O. O. olson...@yahoo.com wrote:
 Tomasz Finc wrote:
 O. O. wrote:

 If it's failing due to an old xsd then ..

 The updated xsd and copy of Import.php just got checked into our
 repositories so you can either pull this

 http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472

 and increase the version number ala

 http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612

 Or you can wait till the next tagged release which will likely include this.

 Thanks Tomasz. I don’t mind waiting for your next release if it is going
 to be in the next month or so.


 (Thanks for the pointer to the source of MW Dumper. The Source is not
 mentioned in the Readme. However, I found it too complicated - or not
 well documented for me at this point.)

 I'll have a peek at this and see if it can be improved.

 I hope someone could  updated MW Dumper to the new XSD – it would help a
 lot as far as importing Wikipedia Dumps are concerned, because
 importDump.php is not practical.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-09 Thread O. O.
Andrew Krizhanovsky wrote:
 Hi!
 
 I have got the same redirect problem while importing the dump of
 Russian Wiktionary. :(
 
 Best regards,
 Andrew Krizhanovsky.
 

So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am 
curious to know what others are using  for their imports. (This is for 
my personal knowledge.)

It seems that the “redirect /” tags are mostly blank while grepping 
through the English Wikipedia Dump. I hope someone can fix this soon.

Thanks to you guys,
O. O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-09 Thread Bilal Abdul Kader
I have used xml2sql, mwdumper, import.php and the python script to import
The two fastest are xml2sql and the python script (xray). The best results
is from importDump.php
mwDumper is slow but it gives good results.

I have not done any import with the new redirect tag.

bilal


On Fri, Oct 9, 2009 at 2:18 PM, O. O. olson...@yahoo.com wrote:

 Andrew Krizhanovsky wrote:
  Hi!
 
  I have got the same redirect problem while importing the dump of
  Russian Wiktionary. :(
 
  Best regards,
  Andrew Krizhanovsky.
 

 So Andrew, do you import using importDump.php, MWDumper or xml2sql? I am
 curious to know what others are using  for their imports. (This is for
 my personal knowledge.)

 It seems that the “redirect /” tags are mostly blank while grepping
 through the English Wikipedia Dump. I hope someone can fix this soon.

 Thanks to you guys,
 O. O.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Verily, with hardship comes ease.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-08 Thread O. O.
Platonides wrote:
 Seems it fails on the new redirect tag.
 
 P.S. http://download.wikimedia.org/tools/ does not have the source of 
 MWDumper. I thought this was open source?
 
 MWDumper source is available at
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/
 
 It should be noted at the readme.

Thanks Platonides.  With the new redirect tag is there anyway to 
import the new XML Files?

Could I simply strip out the redirect / tags from the file, if I 
wanted MWDumper to work. Or if I upgrade to MediaWiki 1.16, would 
import.php work without any problems?

(Thanks for the pointer to the source of MW Dumper. The Source is not 
mentioned in the Readme. However, I found it too complicated - or not 
well documented for me at this point.)

Thanks again,
O.O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-08 Thread Tomasz Finc
O. O. wrote:
 Platonides wrote:
 Seems it fails on the new redirect tag.

 P.S. http://download.wikimedia.org/tools/ does not have the source of 
 MWDumper. I thought this was open source?
 MWDumper source is available at
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/

 It should be noted at the readme.
 
 Thanks Platonides.  With the new redirect tag is there anyway to 
 import the new XML Files?
 
 Could I simply strip out the redirect / tags from the file, if I 
 wanted MWDumper to work. Or if I upgrade to MediaWiki 1.16, would 
 import.php work without any problems?

If it's failing due to an old xsd then ..

The updated xsd and copy of Import.php just got checked into our 
repositories so you can either pull this

http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472

and increase the version number ala

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612

Or you can wait till the next tagged release which will likely include this.

 
 (Thanks for the pointer to the source of MW Dumper. The Source is not 
 mentioned in the Readme. However, I found it too complicated - or not 
 well documented for me at this point.)

I'll have a peek at this and see if it can be improved.

--tomasz

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-08 Thread O. O.
Tomasz Finc wrote:
 O. O. wrote:
 
 If it's failing due to an old xsd then ..
 
 The updated xsd and copy of Import.php just got checked into our 
 repositories so you can either pull this
 
 http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=54472
 
 and increase the version number ala
 
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Export.php?r1=56298r2=56612
 
 Or you can wait till the next tagged release which will likely include this.

Thanks Tomasz. I don’t mind waiting for your next release if it is going 
to be in the next month or so.

 
 (Thanks for the pointer to the source of MW Dumper. The Source is not 
 mentioned in the Readme. However, I found it too complicated - or not 
 well documented for me at this point.)
 
 I'll have a peek at this and see if it can be improved.

I hope someone could  updated MW Dumper to the new XSD – it would help a 
lot as far as importing Wikipedia Dumps are concerned, because 
importDump.php is not practical.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-07 Thread O. O.
Hi,

 I have been importing the English Wikipeida XML Dumps every few 
months (last time I did this was in June). I then used xml2sql and it 
always worked for me. Now I attempted the import on the latest dump 
enwiki-20090920-pages-articles.xml (and on the dump from 
enwiki-20090810-pages-articles.xml), both of these have the error:

 $  xml2sql enwiki-20090920-pages-articles.xml
unexpected element redirect
xml2sql: parsing aborted at line 33 pos 16.

So then I try mwdumper  and after 1.4 M Pages, it craps out:
……
1,423,000 pages (957.283/sec), 1,423,000 revs (957.283/sec)
1,424,000 pages (957.465/sec), 1,424,000 revs (957.465/sec)
Exception in thread main java.lang.IllegalArgumentException: Invalid 
contributor
 at 
org.mediawiki.importer.XmlDumpReader.closeContributor(Unknown Source)
 at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source)
 at 
org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
 at 
org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown 
Source)
 at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown 
Source)
 at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 
Source)
 at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown 
Source)
 at 
org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
 at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
 at org.mediawiki.dumper.Dumper.main(Unknown Source)


I tried the importDump.php and I get errors of the kind (MediaWiki 1.14.0)
…
Warning: xml_parse(): Unable to call handler in_() in 
/var/www/includes/Import.php on line 437
Warning: xml_parse(): Unable to call handler in_() in 
/var/www/includes/Import.php on line 437
Warning: xml_parse(): Unable to call handler out_() in 
/var/www/includes/Import.php on line 437
….
(Sorry I don’t know where this error starts, but it processes a few 
thousand pages, up till I get sick of looking at it before failing.)

Any ideas if the format of the XML files have changed because I can 
swear that as of June or may be May, I had xml2sql working. I know that 
I might need to upgrade MediaWiki to 1.15, however importDump.php 
usually does not work for the English Wikipedia anyways.

I would be grateful if someone has any ideas?
Thanks guys,
O. O.

P.S. http://download.wikimedia.org/tools/ does not have the source of 
MWDumper. I thought this was open source?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Importing English Wikipeida XML Dumps into MediaWiki

2009-10-07 Thread Platonides
O. O. writes:
 (Sorry I don’t know where this error starts, but it processes a few 
 thousand pages, up till I get sick of looking at it before failing.)
 
 Any ideas if the format of the XML files have changed because I can 
 swear that as of June or may be May, I had xml2sql working. I know that 
 I might need to upgrade MediaWiki to 1.15, however importDump.php 
 usually does not work for the English Wikipedia anyways.
 
 I would be grateful if someone has any ideas?
 Thanks guys,
 O. O.

Seems it fails on the new redirect tag.

 P.S. http://download.wikimedia.org/tools/ does not have the source of 
 MWDumper. I thought this was open source?

MWDumper source is available at
http://svn.wikimedia.org/viewvc/mediawiki/trunk/mwdumper/

It should be noted at the readme.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l