Here is an overview of how to work with complex JSON in Spark: https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html (works in streaming and batch)
On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari <ferra...@gmail.com> wrote: > What's against: > > df.rdd.map(...) > > or > > dataset.foreach() > > https://spark.apache.org/docs/2.0.1/api/scala/index.html# > org.apache.spark.sql.Dataset@foreach(f:T=>Unit):Unit > > Best, > > On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com < > lucas.g...@gmail.com> wrote: > >> I've been wondering about this for awhile. >> >> We wanted to do something similar for generically saving thousands of >> individual homogenous events into well formed parquet. >> >> Ultimately I couldn't find something I wanted to own and pushed back on >> the requirements. >> >> It seems the canonical answer is that you need to 'own' the schema of the >> json and parse it out manually and into your dataframe. There's nothing >> challenging about it. Just verbose code. If you're 'info' is a consistent >> schema then you'll be fine. For us it was 12 wildly diverging schemas and >> I didn't want to own the transforms. >> >> I also recommend persisting anything that isn't part of your schema in an >> 'extras field' So when you parse out your json, if you've got anything >> leftover drop it in there for later analysis. >> >> I can provide some sample code but I think it's pretty straightforward / >> you can google it. >> >> What you can't seem to do efficiently is dynamically generate a dataframe >> from random JSON. >> >> >> On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com> >> wrote: >> >>> Implicit tried - didn't worked! >>> >>> from_json - didnt support spark 2.0.1 any alternate solution would be >>> welcome please >>> >>> >>> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler < >>> georg.kf.hei...@gmail.com> wrote: >>> >>>> You need to have spark implicits in scope >>>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. Juli >>>> 2017 um 08:45: >>>> >>>>> I believe you could use JOLT (bazaarvoice/jolt >>>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json string >>>>> and then to dataframe or dataset. >>>>> >>>>> bazaarvoice/jolt >>>>> >>>>> jolt - JSON to JSON transformation library written in Java. >>>>> <https://github.com/bazaarvoice/jolt> >>>>> >>>>> >>>>> >>>>> >>>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri < >>>>> chetan.opensou...@gmail.com> wrote: >>>>> >>>>> >>>>> Explode is not working in this scenario with error - string cannot be >>>>> used in explore either array or map in spark >>>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote: >>>>> >>>>> Hi, >>>>> have you tried to use explode? >>>>> >>>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道: >>>>> >>>>> Hello Spark Dev's, >>>>> >>>>> Can you please guide me, how to flatten JSON to multiple columns in >>>>> Spark. >>>>> >>>>> *Example:* >>>>> >>>>> Sr No Title ISBN Info >>>>> 1 Calculus Theory 1234567890 >>>>> >>>>> [{"cert":[{ >>>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa", >>>>> 009415da-c8cd-418d-869e- 0a19601d79fa >>>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4", >>>>> >>>>> "effDt":"2016-05-06T15:04:56. 279Z", >>>>> >>>>> >>>>> "fileFmt":"rjrCsv","status":" live"}], >>>>> >>>>> "expdCnt":"15", >>>>> "mfgAcctNum":"531093", >>>>> >>>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c", >>>>> >>>>> >>>>> "pgmRole":["RETAILER"], >>>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63", >>>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4", >>>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}] >>>>> >>>>> I want to get single row with 11 columns. >>>>> >>>>> Thanks. >>>>> >>>>> >>> >> >