Re: reading large XML files

2014-05-20 Thread Nathan Kronenfeld
Thanks, that sounds perfect On Tue, May 20, 2014 at 1:38 PM, Xiangrui Meng wrote: > You can search for XMLInputFormat on Google. There are some > implementations that allow you to specify the to split on, e.g.: > > https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collecti

Re: reading large XML files

2014-05-20 Thread Xiangrui Meng
You can search for XMLInputFormat on Google. There are some implementations that allow you to specify the to split on, e.g.: https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld wrote: > Unfortuna

Re: reading large XML files

2014-05-20 Thread Nathan Kronenfeld
Unfortunately, I don't have a bunch of moderately big xml files; I have one, really big file - big enough that reading it into memory as a single string is not feasible. On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng wrote: > Try sc.wholeTextFiles(). It reads the entire file into a string > rec

Re: reading large XML files

2014-05-20 Thread Xiangrui Meng
Try sc.wholeTextFiles(). It reads the entire file into a string record. -Xiangrui On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld wrote: > We are trying to read some large GraphML files to use in spark. > > Is there an easy way to read XML-based files like this that accounts for > partition bo

reading large XML files

2014-05-20 Thread Nathan Kronenfeld
We are trying to read some large GraphML files to use in spark. Is there an easy way to read XML-based files like this that accounts for partition boundaries and the like? Thanks, Nathan -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley St