Thanks, that sounds perfect
On Tue, May 20, 2014 at 1:38 PM, Xiangrui Meng wrote:
> You can search for XMLInputFormat on Google. There are some
> implementations that allow you to specify the to split on, e.g.:
>
> https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collecti
You can search for XMLInputFormat on Google. There are some
implementations that allow you to specify the to split on, e.g.:
https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java
On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld
wrote:
> Unfortuna
Unfortunately, I don't have a bunch of moderately big xml files; I have
one, really big file - big enough that reading it into memory as a single
string is not feasible.
On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng wrote:
> Try sc.wholeTextFiles(). It reads the entire file into a string
> rec
Try sc.wholeTextFiles(). It reads the entire file into a string
record. -Xiangrui
On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld
wrote:
> We are trying to read some large GraphML files to use in spark.
>
> Is there an easy way to read XML-based files like this that accounts for
> partition bo
We are trying to read some large GraphML files to use in spark.
Is there an easy way to read XML-based files like this that accounts for
partition boundaries and the like?
Thanks,
Nathan
--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley St