raja! I found the answer to your question! Look at http://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes this is what you (and I) was looking for. general idea - you read the list as text where project Details is just a string field and then you build the JSON string representation of the whole line and you have a nested JSON schema which SparkSQL can read.
Eran On Thu, Dec 24, 2015 at 10:26 AM Eran Witkon <eranwit...@gmail.com> wrote: > I don't have the exact answer for you but I would look for something using > explode method on DataFrame > > On Thu, Dec 24, 2015 at 7:34 AM Bharathi Raja <raja...@yahoo.com> wrote: > >> Thanks Gokul, but the file I have had the same format as I have >> mentioned. First two columns are not in Json format. >> >> Thanks, >> Raja >> ------------------------------ >> From: Gokula Krishnan D <email2...@gmail.com> >> Sent: 12/24/2015 2:44 AM >> To: Eran Witkon <eranwit...@gmail.com> >> Cc: raja kbv <raja...@yahoo.com>; user@spark.apache.org >> >> Subject: Re: How to Parse & flatten JSON object in a text file using >> Spark &Scala into Dataframe >> >> You can try this .. But slightly modified the input structure since >> first two columns were not in Json format. >> >> [image: Inline image 1] >> >> Thanks & Regards, >> Gokula Krishnan* (Gokul)* >> >> On Wed, Dec 23, 2015 at 9:46 AM, Eran Witkon <eranwit...@gmail.com> >> wrote: >> >>> Did you get a solution for this? >>> >>> On Tue, 22 Dec 2015 at 20:24 raja kbv <raja...@yahoo.com.invalid> wrote: >>> >>>> Hi, >>>> >>>> I am new to spark. >>>> >>>> I have a text file with below structure. >>>> >>>> >>>> (employeeID: Int, Name: String, ProjectDetails: >>>> JsonObject{[{ProjectName, Description, Duriation, Role}]}) >>>> Eg: >>>> (123456, Employee1, {“ProjectDetails”:[ >>>> { >>>> “ProjectName”: “Web Develoement”, “Description” : “Online Sales website”, >>>> “Duration” : “6 Months” , “Role” : “Developer”} >>>> { >>>> “ProjectName”: “Spark Develoement”, “Description” : “Online Sales >>>> Analysis”, “Duration” : “6 Months” , “Role” : “Data Engineer”} >>>> { >>>> “ProjectName”: “Scala Training”, “Description” : “Training”, “Duration” : >>>> “1 Month” } >>>> ] >>>> } >>>> >>>> >>>> Could someone help me to parse & flatten the record as below dataframe >>>> using scala? >>>> >>>> employeeID,Name, ProjectName, Description, Duration, Role >>>> 123456, Employee1, Web Develoement, Online Sales website, 6 Months , >>>> Developer >>>> 123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months, >>>> Data Engineer >>>> 123456, Employee1, Scala Training, Training, 1 Month, null >>>> >>>> >>>> Thank you in advance. >>>> >>>> Regards, >>>> Raja >>>> >>> >>