Hello, Actually I have been through the same problem as you when I was implementing a decision tree algorithm with Spark parsing the output to a comprehensible json format.
So as you said; the correct json format is : [{ "name": "Yin", "address": { "city": "Columbus", "state": "Ohio" } }, { "name": "Michael", "address": { "city": null, "state": "California" } }] However, I had to consider it as a list such as data[0] to get : { "name": "Yin", "address": { "city": "Columbus", "state": "Ohio" } } and then use it for my visualizations. Spark still a bit tricky when dealing with input/output formats, so I guess the solution for now, is to create your own parser. Cheers, *Hechem El Jed* Software Engineer & Business Analyst MY +601131094294 TN +216 24 937 021 [image: View my profile on LinkedIn] <https://www.linkedin.com/in/hechemeljed> Our environment is fragile, please do not print this email unless necessary. On Thu, Mar 31, 2016 at 4:23 PM, charles li <charles.up...@gmail.com> wrote: > as this post says, that in spark, we can load a json file in this way > bellow: > > *post* : > https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html > > > > ----------------------------------------------------------------------------------------------- > sqlContext.jsonFile(file_path) > or > sqlContext.read.json(file_path) > > ----------------------------------------------------------------------------------------------- > > > and the *json file format* looks like bellow, say *people.json* > > > --------------------------------------------------------------------------------------------{"name":"Yin", > "address":{"city":"Columbus","state":"Ohio"}} > {"name":"Michael", "address":{"city":null, "state":"California"}} > > ----------------------------------------------------------------------------------------------- > > > and here comes my *problems*: > > Is that the *standard json format*? according to http://www.json.org/ , I > don't think so. it's just a *collection of records* [ a dict ], not a > valid json format. as the json official doc, the standard json format of > people.json should be : > > > --------------------------------------------------------------------------------------------{"name": > ["Yin", "Michael"], > "address":[ {"city":"Columbus","state":"Ohio"}, > {"city":null, "state":"California"} ] > } > > ----------------------------------------------------------------------------------------------- > > So, why we define the json format as a collection of records in spark, I > mean, it will lead to some unconvenient, for if we had a large standard > json file, we need to firstly format it to make it correctly readable in > spark, which will low-efficiency, time-consuming, un-compatible and > space-consuming. > > > great thanks, > > > > > > > -- > *--------------------------------------* > a spark lover, a quant, a developer and a good man. > > http://github.com/litaotao >