Dear Skip,

You can always use the different dump files to host a local version of
Wikipedia. These dump files are being available at
download.wikimedia.org. However, at this moment there are some
hardware issues and the site is currently not available. Given the
task, I think that the
[language-code][wikiproject]-pages-meta-current.xml.bz2 are the most
interesting files.
You can find a complete dump of August 2009 as part of Amazon's AWS
public datasets at http://aws.amazon.com/publicdatasets/.

I have posted a step-by-step tutorial on Wiki research mailing list
explaining how to get access to those files.

Best,

Diederik

On Wed, Dec 1, 2010 at 11:35 AM, Skip Garner <gar...@vbi.vt.edu> wrote:
> Dear Wikiteam,
>  Guy Chapman requested that I post to the mailing list to ask how we can 
> proceed to getting a copy of Wikipedia so that we can offer it as a database 
> in our free search service, in response to the request in the following 
> paragraph.  He made me aware of its size, but that is not an issue.  I would 
> like to obtain a copy and then establish a routine for automated synced 
> downloads like we do for the other databases we have in our system.
>  I have had several requests to add Wikipedia to our eTBLAST text similarity 
> search engine.  This is to improve reference finding as well as novelty 
> assessment.  Our search tool is widely used, widely published and is free.  
> Please see etblast.org or http://en.wikipedia.org/wiki/ETBLAST.  I would like 
> to create a searchable copy of Wikipedia locally with links back to Wikipedia 
> for hits, and of course acknowledge Wikimedia.  We do this for several open 
> text datasets and are prepared to keep a local, synced copy of Wikipedia, if 
> you are interested.  I am certain that our mutual users would like and 
> benefit from our working together.
>
> Cheers, and thank you,
> Skip
>
>
> ----- Original Message -----
> From: "Wikipedia information team" <info...@wikimedia.org>
> To: "Skip Garner" <gar...@vbi.vt.edu>
> Cc: "Dominik L. Borkowski" <d...@vbi.vt.edu>, "Johnny Sun" 
> <szhao...@vbi.vt.edu>
> Sent: Wednesday, December 1, 2010 9:43:25 AM
> Subject: Re: [Ticket#2010112810016598] I would like to provide a different 
> search engine for Wikimedia
>
> Dear Skip Garner,
>
> Thank you for your email.  Our response follows your message.
>
> 11/29/2010 16:23 - Skip Garner wrote:
>
>> Guy,
>>   Thank you for the information.  I would like to move forward on this, for I
> think it will be of mutual value.  The size of the database is not an issue, 
> and
> we are always expanding our storage and serving capabilities.  We regularly 
> work
> with data in the 100's of T in size.  One issue would be getting the first 
> copy,
> but we could probably handle that by fed-x.
>>   Can you tell me how we can proceed?
>>
>> Cheers,
>> Skip
>
>
>
> The best bet is probably to email the wikitech mailing list, which is where 
> the
> devs hang out.
>
> <https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
> They will have the best idea of the practicalities.
>
>
> Yours sincerely,
> Guy Chapman
>
> --
> Wikipedia - http://en.wikipedia.org
> ---
> Disclaimer: all mail to this address is answered by volunteers, and responses 
> are
> not to be considered an official statement of the Wikimedia Foundation. For
> official correspondence, please contact the Wikimedia Foundation by certified 
> mail
> at the address listed on http://www.wikimediafoundation.org
>
>
> --
> Harold "Skip" Garner
> Executive Director
> Virginia Bioinformatics Institute
> Virginia Tech
> Washington Street (0477)
> Blacksburg, VA 24061
> http://www.vbi.vt.edu
>
> Phone: 540.231.2582
> Fax: 540.231.1388
>
> Assistant: Renee Nester
> re...@vbi.vt.edu
> 540.231.2582
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
<a href="http://about.me/diederik";>Check out my about.me profile!</a>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to