What's against: df.rdd.map(...)
or dataset.foreach() https://spark.apache.org/docs/2.0.1/api/scala/index.html#org.apache.spark.sql.Dataset@foreach(f:T= >Unit):Unit Best, On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com <lucas.g...@gmail.com> wrote: > I've been wondering about this for awhile. > > We wanted to do something similar for generically saving thousands of > individual homogenous events into well formed parquet. > > Ultimately I couldn't find something I wanted to own and pushed back on > the requirements. > > It seems the canonical answer is that you need to 'own' the schema of the > json and parse it out manually and into your dataframe. There's nothing > challenging about it. Just verbose code. If you're 'info' is a consistent > schema then you'll be fine. For us it was 12 wildly diverging schemas and > I didn't want to own the transforms. > > I also recommend persisting anything that isn't part of your schema in an > 'extras field' So when you parse out your json, if you've got anything > leftover drop it in there for later analysis. > > I can provide some sample code but I think it's pretty straightforward / > you can google it. > > What you can't seem to do efficiently is dynamically generate a dataframe > from random JSON. > > > On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > >> Implicit tried - didn't worked! >> >> from_json - didnt support spark 2.0.1 any alternate solution would be >> welcome please >> >> >> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler <georg.kf.hei...@gmail.com >> > wrote: >> >>> You need to have spark implicits in scope >>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. Juli >>> 2017 um 08:45: >>> >>>> I believe you could use JOLT (bazaarvoice/jolt >>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json string >>>> and then to dataframe or dataset. >>>> >>>> bazaarvoice/jolt >>>> >>>> jolt - JSON to JSON transformation library written in Java. >>>> <https://github.com/bazaarvoice/jolt> >>>> >>>> >>>> >>>> >>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri < >>>> chetan.opensou...@gmail.com> wrote: >>>> >>>> >>>> Explode is not working in this scenario with error - string cannot be >>>> used in explore either array or map in spark >>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote: >>>> >>>> Hi, >>>> have you tried to use explode? >>>> >>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道: >>>> >>>> Hello Spark Dev's, >>>> >>>> Can you please guide me, how to flatten JSON to multiple columns in >>>> Spark. >>>> >>>> *Example:* >>>> >>>> Sr No Title ISBN Info >>>> 1 Calculus Theory 1234567890 >>>> >>>> [{"cert":[{ >>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa", >>>> 009415da-c8cd-418d-869e- 0a19601d79fa >>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4", >>>> >>>> "effDt":"2016-05-06T15:04:56. 279Z", >>>> >>>> >>>> "fileFmt":"rjrCsv","status":" live"}], >>>> >>>> "expdCnt":"15", >>>> "mfgAcctNum":"531093", >>>> >>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c", >>>> >>>> >>>> "pgmRole":["RETAILER"], >>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63", >>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4", >>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}] >>>> >>>> I want to get single row with 11 columns. >>>> >>>> Thanks. >>>> >>>> >> >