Re: JSON Arrays and Spark

Hyukjin Kwon Tue, 11 Oct 2016 23:15:22 -0700

No, I meant it should be in a single line but it supports array type too as
a root wrapper of JSON objects.


If you need to parse multiple lines, I have a reference here.

http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files/

2016-10-12 15:04 GMT+09:00 Kappaganthu, Sivaram (ES) <
sivaram.kappagan...@adp.com>:

> Hi,
>
>
>
> Does this mean that handling any Json with kind of below schema  with
> spark is not a good fit?? I have requirement to parse the below Json that
> spans across multiple lines. Whats the best way to parse the jsns of this
> kind?? Please suggest.
>
>
>
> root
>
> |-- maindate: struct (nullable = true)
>
> |    |-- mainidnId: string (nullable = true)
>
> |-- Entity: array (nullable = true)
>
> |    |-- element: struct (containsNull = true)
>
> |    |    |-- Profile: struct (nullable = true)
>
> |    |    |    |-- Kind: string (nullable = true)
>
> |    |    |-- Identifier: string (nullable = true)
>
> |    |    |-- Group: array (nullable = true)
>
> |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |-- Period: struct (nullable = true)
>
> |    |    |    |    |    |-- pid: string (nullable = true)
>
> |    |    |    |    |    |-- pDate: string (nullable = true)
>
> |    |    |    |    |    |-- quarter: long (nullable = true)
>
> |    |    |    |    |    |-- labour: array (nullable = true)
>
> |    |    |    |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |    |    |    |-- category: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- id: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- person: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- address: array (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- element: struct
> (containsNull = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- city: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line1: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line2: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- postalCode: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- state: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- type: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- familyName: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |-- tax: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- code: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qwage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qvalue: double (nullable
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- qSubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- qfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- ywage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- yalue: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- ySubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- yfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- tProfile: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- isExempt: boolean
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- jurisdiction: struct
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- code: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- maritalStatus: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- numberOfDeductions: long
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- wDate: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- originalHireDate: string
> (nullable = true)
>
> |    |    |    |    |    |-- year: long (nullable = true)
>
>
>
>
>
> *From:* Luciano Resende [mailto:luckbr1...@gmail.com]
> *Sent:* Monday, October 10, 2016 11:39 PM
> *To:* Jean Georges Perrin
> *Cc:* user @spark
> *Subject:* Re: JSON Arrays and Spark
>
>
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets
>
> Particularly the note at the required format :
>
> Note that the file that is offered as *a json file* is not a typical JSON
> file. Each line must contain a separate, self-contained valid JSON object.
> As a consequence, a regular multi-line JSON file will most often fail.
>
>
>
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <j...@jgp.net> wrote:
>
> Hi folks,
>
>
>
> I am trying to parse JSON arrays and it’s getting a little crazy (for me
> at least)…
>
>
>
> 1)
>
> If my JSON is:
>
> {"vals":[100,500,600,700,800,200,900,300]}
>
>
>
> I get:
>
> +--------------------+
>
> |                vals|
>
> +--------------------+
>
> |[100, 500, 600, 7...|
>
> +--------------------+
>
>
>
> root
>
>  |-- vals: array (nullable = true)
>
>  |    |-- element: long (containsNull = true)
>
>
>
> and I am :)
>
>
>
> 2)
>
> If my JSON is:
>
> [100,500,600,700,800,200,900,300]
>
>
>
> I get:
>
> +--------------------+
>
> |     _corrupt_record|
>
> +--------------------+
>
> |[100,500,600,700,...|
>
> +--------------------+
>
>
>
> root
>
>  |-- _corrupt_record: string (nullable = true)
>
>
>
> Both are legit JSON structures… Do you think that #2 is a bug?
>
>
>
> jg
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> ------------------------------
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>

Re: JSON Arrays and Spark

Reply via email to