do you want id1, id2, id3 to be processed similarly? The Java code I use is: df = df.withColumn(K.NAME, df.col("fields.premise_name"));
the original structure is something like {"fields":{"premise_name":"ccc"}} hope it helps > On Jul 7, 2016, at 1:48 AM, Lan Jiang <ljia...@gmail.com> wrote: > > Hi, there > > Spark has provided json document processing feature for a long time. In most > examples I see, each line is a json object in the sample file. That is the > easiest case. But how can we process a json document, which does not conform > to this standard format (one line per json object)? Here is the document I am > working on. > > First of all, it is multiple lines for one single big json object. The real > file can be as long as 20+ G. Within that one single json object, it contains > many name/value pairs. The name is some kind of id values. The value is the > actual json object that I would like to be part of dataframe. Is there any > way to do that? Appreciate any input. > > > { > "id1": { > "Title":"title1", > "Author":"Tom", > "Source":{ > "Date":"20160506", > "Type":"URL" > }, > "Data":" blah blah"}, > > "id2": { > "Title":"title2", > "Author":"John", > "Source":{ > "Date":"20150923", > "Type":"URL" > }, > "Data":" blah blah "}, > > "id3: { > "Title":"title3", > "Author":"John", > "Source":{ > "Date":"20150902", > "Type":"URL" > }, > "Data":" blah blah "} > } > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org