There was a previous discussion about this here: http://apache-spark-user-list.1001560.n3.nabble.com/Having-Spark-read-a-JSON-file-td1963.html
How big are the XML or JSON files you're looking to deal with? It may not be practical to deserialize the entire document at once. In that case an obvious work-around would be to have some kind of pre-processing step that separates XML nodes/JSON objects with newlines so that you *can* analyze the data with Spark in a "line-oriented format". Your preprocessor wouldn't have to parse/deserialize the massive document; it would just have to track open/closed tags/braces to know when to insert a newline. Then you'd just open the line-delimited result and deserialize the individual objects/nodes with map(). Nick On Mon, Mar 17, 2014 at 11:18 AM, Diana Carroll <dcarr...@cloudera.com>wrote: > Has anyone got a working example of a Spark application that analyzes data > in a non-line-oriented format, such as XML or JSON? I'd like to do this > without re-inventing the wheel...anyone care to share? Thanks! > > Diana >