You are correct that it does not take the standard JSON file format. From the 
Spark Docs:
"Note that the file that is offered as a json file is not a typical JSON file. 
Each line must contain a separate, self-contained valid JSON object. As a 
consequence, a regular multi-line JSON file will most often fail.”

http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

On Mar 31, 2016, at 5:30 AM, charles li 
<charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote:

hi, UMESH, have you tried to load that json file on your machine? I did try it 
before, and here is the screenshot:

<屏幕快照 2016-03-31 下午5.27.30.png>
<屏幕快照 2016-03-31 下午5.27.39.png>
​
​




On Thu, Mar 31, 2016 at 5:19 PM, UMESH CHAUDHARY 
<umesh9...@gmail.com<mailto:umesh9...@gmail.com>> wrote:
Hi Charles,
The definition of object from 
www.json.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.json.org&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=goeVxSn01bVFiVJp7KJ9Yaz8FjuPpCfcS65BtTLr1d4&e=>:

An object is an unordered set of name/value pairs. An object begins with { 
(left brace) and ends with } (right brace). Each name is followed by : (colon) 
and the name/value pairs are separated by , (comma).

Its a pretty much OOPS paradigm , isn't it?

Regards,
Umesh

On Thu, Mar 31, 2016 at 2:34 PM, charles li 
<charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote:
hi, UMESH, I think you've misunderstood the json definition.

there is only one object in a json file:


for the file, people.json, as bellow:

--------------------------------------------------------------------------------------------

{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}

-----------------------------------------------------------------------------------------------

it does have two valid format:

1.

--------------------------------------------------------------------------------------------

[ {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}},
{"name":"Michael", "address":{"city":null, "state":"California"}}
]

-----------------------------------------------------------------------------------------------

2.

--------------------------------------------------------------------------------------------

{"name": ["Yin", "Michael"],
"address":[ {"city":"Columbus","state":"Ohio"},
{"city":null, "state":"California"} ]
}
-----------------------------------------------------------------------------------------------



On Thu, Mar 31, 2016 at 4:53 PM, UMESH CHAUDHARY 
<umesh9...@gmail.com<mailto:umesh9...@gmail.com>> wrote:
Hi,
Look at below image which is from 
json.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__json.org&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=R1os0JBEfw1hBGFnNmMyqIHc17wYCdE2yyJVjANbY88&e=>
 :

<image.png>

The above image describes the object formulation of below JSON:

Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
Object=> {"name":"Michael", "address":{"city":null, "state":"California"}}


Note that "address" is also an object.



On Thu, Mar 31, 2016 at 1:53 PM, charles li 
<charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote:
as this post  says, that in spark, we can load a json file in this way bellow:

post : 
https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__databricks.com_blog_2015_02_02_an-2Dintroduction-2Dto-2Djson-2Dsupport-2Din-2Dspark-2Dsql.html&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=zsbEQhumiJod3T8z6Ev_pLMmhJQp5gYOpYbvVl8iPto&e=>


-----------------------------------------------------------------------------------------------
sqlContext.jsonFile(file_path)
or
sqlContext.read.json(file_path)
-----------------------------------------------------------------------------------------------


and the json file format looks like bellow, say people.json

--------------------------------------------------------------------------------------------{"name":"Yin",
 "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}
-----------------------------------------------------------------------------------------------


and here comes my problems:

Is that the standard json format? according to 
http://www.json.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.json.org_&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=dqmXt1Kv3AFEJPSn-Bpp6LCBkR-pbTHlLYAYbZ_sMDQ&e=>
 , I don't think so. it's just a collection of records [ a dict ], not a valid 
json format. as the json official doc, the standard json format of people.json 
should be :

--------------------------------------------------------------------------------------------{"name":
 ["Yin", "Michael"],
"address":[ {"city":"Columbus","state":"Ohio"},
{"city":null, "state":"California"} ]
}
-----------------------------------------------------------------------------------------------

So, why we define the json format as a collection of records in spark, I mean, 
it will lead to some unconvenient, for if we had a large standard json file, we 
need to firstly format it to make it correctly readable in spark, which will 
low-efficiency, time-consuming, un-compatible and space-consuming.


great thanks,






--
--------------------------------------
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=>




--
--------------------------------------
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=>




--
--------------------------------------
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=>

Reply via email to