Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
Hey Quim, hey Maria, thank you for your replies! I actually knew where to find the XML-dumps but that pointer about the new XML-import tools is really helpful. So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS to start my experiments on :) Let's see what this baby can do * http://i.imgur.com/J47GJ.gif * Thanks again Andreas On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.mit...@gmail.comwrote: Hi, You might also try the following mailing list: * XML Data Dumps mailing listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l * Here is some info on importing XML dumps ( not sure what tools work well but probably the mailing list can help with that) http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing Also, Ariel Glenn recently announced two new tools for importing dumps on the XML list: http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html Mariya On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote: On 03/05/2013 02:54 AM, Andreas Nüßlein wrote: Hi list, so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D Just in case: http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps Also, you might want to ask / discuss at https://lists.wikimedia.org/**mailman/listinfo/offline-l https://lists.wikimedia.org/mailman/listinfo/offline-l Good luck with this interesting project! -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/**User:Qgil http://www.mediawiki.org/wiki/User:Qgil __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
Andreas Nüßlein wrote: so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D I know it's rather a mammoth project so I was wondering if somebody could give me some pointers? First of all, I would need to know what kind of hardware I should get. Is it possible/smart to have it all in two ginormous MySQL-Instance (one for each of the languages) or will I need to do sharding? I don't need it to run smoothly. I only need to be able to query the database (and I know some of these queries can run for days) I will probably have access to some rather powerful machines here at the university and I have also quite a few workstation-machines on which I could theoretically do the sharding. Ryan L. or Marc P.: I routed Andreas to this list (from #wikimedia-toolserver), as I figured these questions related to the work that you all have been doing for Wikimedia Labs. Or at least I figured you all probably had some kind of formula for hardware provisioning that might be reusable here. Any pointers would be great. :-) MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Replicating enwiki and dewiki for research purposes
Hi list, so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D I know it's rather a mammoth project so I was wondering if somebody could give me some pointers? First of all, I would need to know what kind of hardware I should get. Is it possible/smart to have it all in two ginormous MySQL-Instance (one for each of the languages) or will I need to do sharding? I don't need it to run smoothly. I only need to be able to query the database (and I know some of these queries can run for days) I will probably have access to some rather powerful machines here at the university and I have also quite a few workstation-machines on which I could theoretically do the sharding. Thanks in advance Andreas PS: If it helps: I'm living in Berlin and I will gladly also just have a face-to-face meeting with anybody willing to share wisdom :) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
On 03/05/2013 02:54 AM, Andreas Nüßlein wrote: Hi list, so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D Just in case: http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps Also, you might want to ask / discuss at https://lists.wikimedia.org/mailman/listinfo/offline-l Good luck with this interesting project! -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
Hi, You might also try the following mailing list: * XML Data Dumps mailing listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l * Here is some info on importing XML dumps ( not sure what tools work well but probably the mailing list can help with that) http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing Also, Ariel Glenn recently announced two new tools for importing dumps on the XML list: http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html Mariya On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote: On 03/05/2013 02:54 AM, Andreas Nüßlein wrote: Hi list, so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D Just in case: http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumpshttp://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps Also, you might want to ask / discuss at https://lists.wikimedia.org/**mailman/listinfo/offline-lhttps://lists.wikimedia.org/mailman/listinfo/offline-l Good luck with this interesting project! -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/**User:Qgilhttp://www.mediawiki.org/wiki/User:Qgil __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l