In particular, we are using this dataset:

Ankur <>

On Sun, Mar 30, 2014 at 12:45 AM, Ankur Dave <> wrote:

> The GraphX team has been using Wikipedia dumps from
> Unfortunately, these are in a less
> convenient format than the Freebase dumps. In particular, an article may
> span multiple lines, so more involved input parsing is required.
> Dan Crankshaw (cc'd) wrote a driver that uses a Hadoop InputFormat XML
> parser from Mahout: see 
> WikiPipelineBenchmark.scala<>and
> WikiArticle.scala<>
> .
> However, we plan to upload a parsed version of this dataset to S3 for
> easier access from Spark and GraphX.
> Ankur <>
> On 27 Mar, 2014, at 9:45 pm, Niko Stahl <> wrote:
> I would like to run the 
> WikipediaPageRank<>example,
>  but the Wikipedia dump XML files are no longer available on
>> Freebase. Does anyone know an alternative source for the data?

Reply via email to