i've seen this done using mapPartitions() where each partition represents a
single, multi-line json file. you can rip through each partition (json
file) and parse the json doc as a whole.
this assumes you use sc.textFile("<path>/*.json") or equivalent to load in
multiple files at once. each json file will be a partition.
not sure if this satisfies your use case, but might be a good starting
point.
-chris
On Mon, Jul 14, 2014 at 2:55 PM, SK <[email protected]> wrote:
> Hi,
>
> I have a json file where the definition of each object spans multiple
> lines.
> An example of one object definition appears below.
>
> {
> "name": "16287e9cdf",
> "width": 500,
> "height": 325,
> "width": 1024,
> "height": 665,
> "obj": [
> {
> "x": 395.08,
> "y": 82.09,
> "w": 185.48677,
> "h": 185.48677,
> "min": 50,
> "max": 59,
> "attr1": 2,
> "attr2": 68,
> "attr3": 8
> },
> {
> "x": 519.1,
> "y": 225.8,
> "w": 170,
> "h": 171,
> "min": 20,
> "max": 29,
> "attr1": 7,
> "attr2": 93,
> "attr3": 10
> }
> ]
> }
>
> I used the following Spark code to parse the file. However, the parsing is
> failing because I think it expects one Json object definition per line. I
> can try to preprocess the input file to remove the new lines, but I would
> like to know if it is possible to parse a Json object definition that spans
> multiple lines, directly in Spark.
>
> val inp = sc.textFile(args(0))
> val res = inp.map(line => { parse(line) })
> .map(json =>
> {
> implicit lazy val formats =
> org.json4s.DefaultFormats
> val image = (json \ "name").extract[String]
> }
> )
>
>
> Thanks for your help.
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-Json-object-definition-spanning-multiple-lines-tp9659.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>