Re: Parsing one big multiple line .xml loaded in RDD using Python

2014-10-07 Thread jan.zikes
tion it shouold be possible, but it seems that it does not work for me.   ______ Od: Davies Liu Komu: Datum: 07.10.2014 17:38 Předmět: Re: Parsing one big multiple line .xml loaded in RDD using Python CC: "u...@spark.incubator.apache.org" Maybe

Re: Parsing one big multiple line .xml loaded in RDD using Python

2014-10-07 Thread Davies Liu
Maybe sc.wholeTextFile() is what you want, you can get the whole text and parse it by yourself. On Tue, Oct 7, 2014 at 1:06 AM, wrote: > Hi, > > I have already unsucesfully asked quiet simmilar question at stackoverflow, > particularly here: > http://stackoverflow.com/questions/26202978/spark-an

Parsing one big multiple line .xml loaded in RDD using Python

2014-10-07 Thread jan.zikes
Hi, I have already unsucesfully asked quiet simmilar question at stackoverflow, particularly here:  http://stackoverflow.com/questions/26202978/spark-and-python-trying-to-parse-wikipedia-using-gensim. I've also unsucessfully tryied some workaround, but unsucessfuly, workaround problem can be f