Hyukjin Kwon created SPARK-21355: ------------------------------------ Summary: JSON datasource failed to parse a value having newline in its value Key: SPARK-21355 URL: https://issues.apache.org/jira/browse/SPARK-21355 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Hyukjin Kwon Priority: Minor
I guess this is a rather corner case. I found this during testing SPARK-21289. It looks a bug in Jackson. The codes below failed to parse newline in the value. {code} scala> spark.read.json(Seq("{ \"f\": \"a\nb\"}", "{ \"f\": \"abc\"}").toDS).show() +---------------+----+ |_corrupt_record| f| +---------------+----+ | { "f": "a b"}|null| | null| abc| +---------------+----+ {code} Meaning this also does not work with the JSON files as below: {code} {"f": " d", "f0": 3} {code} {code} scala> spark.read.option("multiLine", true).json("tmp.json").show() +--------------------+ | _corrupt_record| +--------------------+ |{"f": " d", "f0"...| +--------------------+ {code} Of course, the codes below work: {code} scala> spark.read.json(Seq("{ \"f\": \"ab\"}", "{ \"f\": \"abc\"}").toDS).show() +---+ | f| +---+ | ab| |abc| +---+ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org