[Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-05 Thread Andreas Nüßlein
Hi list,

so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D

I know it's rather a mammoth project so I was wondering if somebody could
give me some pointers?

First of all, I would need to know what kind of hardware I should get. Is
it possible/smart to have it all in two ginormous MySQL-Instance (one for
each of the languages) or will I need to do sharding?

I don't need it to run smoothly. I only need to be able to query the
database (and I know some of these queries can run for days)

I will probably have access to some rather powerful machines here at the
university and I have also quite a few workstation-machines on which I
could theoretically do the sharding.


Thanks in advance
Andreas



PS: If it helps: I'm living in Berlin and I will gladly also just have a
face-to-face meeting with anybody willing to share wisdom :)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread Andreas Nüßlein
Hey Quim, hey Maria,

thank you for your replies!
I actually knew where to find the XML-dumps but that pointer about the new
XML-import tools is really helpful.


So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS  to
start my experiments on :)
Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *

Thanks again
Andreas



On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva wrote:

> Hi,
>
> You might also try the following mailing list:
> * XML Data Dumps mailing
> list<https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
>  *
>
> Here is some info on importing XML dumps ( not sure what tools work well
> but probably the mailing list can help with that)
> http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
>
> Also, Ariel Glenn recently announced two new tools for importing dumps on
> the XML list:
>
> http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html
>
> Mariya
>
>
>
> On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil  wrote:
>
> > On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
> >
> >> Hi list,
> >>
> >> so I need to set up a local instance of the dewiki- and enwiki-DB with
> all
> >> revisions.. :-D
> >>
> >
> > Just in case:
> > http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
> <http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps>
> >
> > Also, you might want to ask / discuss at
> >
> > https://lists.wikimedia.org/**mailman/listinfo/offline-l<
> https://lists.wikimedia.org/mailman/listinfo/offline-l>
> >
> > Good luck with this interesting project!
> >
> > --
> > Quim Gil
> > Technical Contributor Coordinator @ Wikimedia Foundation
> > http://www.mediawiki.org/wiki/**User:Qgil<
> http://www.mediawiki.org/wiki/User:Qgil>
> >
> >
> > __**_
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l