do you want id1, id2, id3 to be processed similarly?

The Java code I use is:
                df = df.withColumn(K.NAME, df.col("fields.premise_name"));

the original structure is something like {"fields":{"premise_name":"ccc"}}

hope it helps

> On Jul 7, 2016, at 1:48 AM, Lan Jiang <ljia...@gmail.com> wrote:
> 
> Hi, there
> 
> Spark has provided json document processing feature for a long time. In most 
> examples I see, each line is a json object in the sample file. That is the 
> easiest case. But how can we process a json document, which does not conform 
> to this standard format (one line per json object)? Here is the document I am 
> working on. 
> 
> First of all, it is multiple lines for one single big json object. The real 
> file can be as long as 20+ G. Within that one single json object, it contains 
> many name/value pairs. The name is some kind of id values. The value is the 
> actual json object that I would like to be part of dataframe. Is there any 
> way to do that? Appreciate any input. 
> 
> 
> {
>     "id1": {
>     "Title":"title1",
>     "Author":"Tom",
>     "Source":{
>         "Date":"20160506",
>         "Type":"URL"
>     },
>     "Data":" blah blah"},
> 
>     "id2": {
>     "Title":"title2",
>     "Author":"John",
>     "Source":{
>         "Date":"20150923",
>         "Type":"URL"
>     },
>     "Data":"  blah blah "},
> 
>     "id3: {
>     "Title":"title3",
>     "Author":"John",
>     "Source":{
>         "Date":"20150902",
>         "Type":"URL"
>     },
>     "Data":" blah blah "}
> }
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to