Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread Andreas Nüßlein
Hey Quim, hey Maria,

thank you for your replies!
I actually knew where to find the XML-dumps but that pointer about the new
XML-import tools is really helpful.


So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS  to
start my experiments on :)
Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *

Thanks again
Andreas



On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.mit...@gmail.comwrote:

 Hi,

 You might also try the following mailing list:
 * XML Data Dumps mailing
 listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
  *

 Here is some info on importing XML dumps ( not sure what tools work well
 but probably the mailing list can help with that)
 http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing

 Also, Ariel Glenn recently announced two new tools for importing dumps on
 the XML list:

 http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html

 Mariya



 On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote:

  On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
 
  Hi list,
 
  so I need to set up a local instance of the dewiki- and enwiki-DB with
 all
  revisions.. :-D
 
 
  Just in case:
  http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
 http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
 
  Also, you might want to ask / discuss at
 
  https://lists.wikimedia.org/**mailman/listinfo/offline-l
 https://lists.wikimedia.org/mailman/listinfo/offline-l
 
  Good luck with this interesting project!
 
  --
  Quim Gil
  Technical Contributor Coordinator @ Wikimedia Foundation
  http://www.mediawiki.org/wiki/**User:Qgil
 http://www.mediawiki.org/wiki/User:Qgil
 
 
  __**_
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/**mailman/listinfo/wikitech-l
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-05 Thread Andreas Nüßlein
Hi list,

so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D

I know it's rather a mammoth project so I was wondering if somebody could
give me some pointers?

First of all, I would need to know what kind of hardware I should get. Is
it possible/smart to have it all in two ginormous MySQL-Instance (one for
each of the languages) or will I need to do sharding?

I don't need it to run smoothly. I only need to be able to query the
database (and I know some of these queries can run for days)

I will probably have access to some rather powerful machines here at the
university and I have also quite a few workstation-machines on which I
could theoretically do the sharding.


Thanks in advance
Andreas



PS: If it helps: I'm living in Berlin and I will gladly also just have a
face-to-face meeting with anybody willing to share wisdom :)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l