Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread Andreas Nüßlein
Hey Quim, hey Maria,

thank you for your replies!
I actually knew where to find the XML-dumps but that pointer about the new
XML-import tools is really helpful.


So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS  to
start my experiments on :)
Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *

Thanks again
Andreas



On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.mit...@gmail.comwrote:

 Hi,

 You might also try the following mailing list:
 * XML Data Dumps mailing
 listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
  *

 Here is some info on importing XML dumps ( not sure what tools work well
 but probably the mailing list can help with that)
 http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing

 Also, Ariel Glenn recently announced two new tools for importing dumps on
 the XML list:

 http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html

 Mariya



 On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote:

  On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
 
  Hi list,
 
  so I need to set up a local instance of the dewiki- and enwiki-DB with
 all
  revisions.. :-D
 
 
  Just in case:
  http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
 http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
 
  Also, you might want to ask / discuss at
 
  https://lists.wikimedia.org/**mailman/listinfo/offline-l
 https://lists.wikimedia.org/mailman/listinfo/offline-l
 
  Good luck with this interesting project!
 
  --
  Quim Gil
  Technical Contributor Coordinator @ Wikimedia Foundation
  http://www.mediawiki.org/wiki/**User:Qgil
 http://www.mediawiki.org/wiki/User:Qgil
 
 
  __**_
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/**mailman/listinfo/wikitech-l
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread MZMcBride
Andreas Nüßlein wrote:
so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D

I know it's rather a mammoth project so I was wondering if somebody could
give me some pointers?

First of all, I would need to know what kind of hardware I should get. Is
it possible/smart to have it all in two ginormous MySQL-Instance (one for
each of the languages) or will I need to do sharding?

I don't need it to run smoothly. I only need to be able to query the
database (and I know some of these queries can run for days)

I will probably have access to some rather powerful machines here at the
university and I have also quite a few workstation-machines on which I
could theoretically do the sharding.

Ryan L. or Marc P.: I routed Andreas to this list (from
#wikimedia-toolserver), as I figured these questions related to the work
that you all have been doing for Wikimedia Labs. Or at least I figured you
all probably had some kind of formula for hardware provisioning that might
be reusable here. Any pointers would be great. :-)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-05 Thread Andreas Nüßlein
Hi list,

so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D

I know it's rather a mammoth project so I was wondering if somebody could
give me some pointers?

First of all, I would need to know what kind of hardware I should get. Is
it possible/smart to have it all in two ginormous MySQL-Instance (one for
each of the languages) or will I need to do sharding?

I don't need it to run smoothly. I only need to be able to query the
database (and I know some of these queries can run for days)

I will probably have access to some rather powerful machines here at the
university and I have also quite a few workstation-machines on which I
could theoretically do the sharding.


Thanks in advance
Andreas



PS: If it helps: I'm living in Berlin and I will gladly also just have a
face-to-face meeting with anybody willing to share wisdom :)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-05 Thread Quim Gil

On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:

Hi list,

so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D


Just in case:
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps

Also, you might want to ask / discuss at

https://lists.wikimedia.org/mailman/listinfo/offline-l

Good luck with this interesting project!

--
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-05 Thread Maria Miteva
Hi,

You might also try the following mailing list:
* XML Data Dumps mailing
listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
 *

Here is some info on importing XML dumps ( not sure what tools work well
but probably the mailing list can help with that)
http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing

Also, Ariel Glenn recently announced two new tools for importing dumps on
the XML list:
http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html

Mariya



On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote:

 On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:

 Hi list,

 so I need to set up a local instance of the dewiki- and enwiki-DB with all
 revisions.. :-D


 Just in case:
 http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumpshttp://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps

 Also, you might want to ask / discuss at

 https://lists.wikimedia.org/**mailman/listinfo/offline-lhttps://lists.wikimedia.org/mailman/listinfo/offline-l

 Good luck with this interesting project!

 --
 Quim Gil
 Technical Contributor Coordinator @ Wikimedia Foundation
 http://www.mediawiki.org/wiki/**User:Qgilhttp://www.mediawiki.org/wiki/User:Qgil


 __**_
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l