Re: Flatten JSON to multiple columns in Spark

Riccardo Ferrari Tue, 18 Jul 2017 10:29:32 -0700

What's against:

df.rdd.map(...)


or

dataset.foreach()

https://spark.apache.org/docs/2.0.1/api/scala/index.html#org.apache.spark.sql.Dataset@foreach(f:T=
>Unit):Unit

Best,

On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com <lucas.g...@gmail.com>
wrote:

> I've been wondering about this for awhile.
>
> We wanted to do something similar for generically saving thousands of
> individual homogenous events into well formed parquet.
>
> Ultimately I couldn't find something I wanted to own and pushed back on
> the requirements.
>
> It seems the canonical answer is that you need to 'own' the schema of the
> json and parse it out manually and into your dataframe.  There's nothing
> challenging about it.  Just verbose code.  If you're 'info' is a consistent
> schema then you'll be fine.  For us it was 12 wildly diverging schemas and
> I didn't want to own the transforms.
>
> I also recommend persisting anything that isn't part of your schema in an
> 'extras field'  So when you parse out your json, if you've got anything
> leftover drop it in there for later analysis.
>
> I can provide some sample code but I think it's pretty straightforward /
> you can google it.
>
> What you can't seem to do efficiently is dynamically generate a dataframe
> from random JSON.
>
>
> On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com>
> wrote:
>
>> Implicit tried - didn't worked!
>>
>> from_json - didnt support spark 2.0.1 any alternate solution would be
>> welcome please
>>
>>
>> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler <georg.kf.hei...@gmail.com
>> > wrote:
>>
>>> You need to have spark implicits in scope
>>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. Juli
>>> 2017 um 08:45:
>>>
>>>> I believe you could use JOLT (bazaarvoice/jolt
>>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json string
>>>> and then to dataframe or dataset.
>>>>
>>>> bazaarvoice/jolt
>>>>
>>>> jolt - JSON to JSON transformation library written in Java.
>>>> <https://github.com/bazaarvoice/jolt>
>>>>
>>>>
>>>>
>>>>
>>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri <
>>>> chetan.opensou...@gmail.com> wrote:
>>>>
>>>>
>>>> Explode is not working in this scenario with error - string cannot be
>>>> used in explore either array or map in spark
>>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> have you tried to use explode?
>>>>
>>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道：
>>>>
>>>> Hello Spark Dev's,
>>>>
>>>> Can you please guide me, how to flatten JSON to multiple columns in
>>>> Spark.
>>>>
>>>> *Example:*
>>>>
>>>> Sr No Title ISBN Info
>>>> 1 Calculus Theory 1234567890
>>>>
>>>> [{"cert":[{
>>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa",
>>>> 009415da-c8cd-418d-869e- 0a19601d79fa
>>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4",
>>>>
>>>> "effDt":"2016-05-06T15:04:56. 279Z",
>>>>
>>>>
>>>> "fileFmt":"rjrCsv","status":" live"}],
>>>>
>>>> "expdCnt":"15",
>>>> "mfgAcctNum":"531093",
>>>>
>>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c",
>>>>
>>>>
>>>> "pgmRole":["RETAILER"],
>>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63",
>>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4",
>>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}]
>>>>
>>>> I want to get single row with 11 columns.
>>>>
>>>> Thanks.
>>>>
>>>>
>>
>

Re: Flatten JSON to multiple columns in Spark

Reply via email to