Hyukjin Kwon created SPARK-21355:
------------------------------------

             Summary: JSON datasource failed to parse a value having newline in 
its value
                 Key: SPARK-21355
                 URL: https://issues.apache.org/jira/browse/SPARK-21355
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Hyukjin Kwon
            Priority: Minor


I guess this is a rather corner case. I found this during testing SPARK-21289.
It looks a bug in Jackson.

The codes below failed to parse newline in the value.

{code}
scala> spark.read.json(Seq("{ \"f\": \"a\nb\"}", "{ \"f\": 
\"abc\"}").toDS).show()
+---------------+----+
|_corrupt_record|   f|
+---------------+----+
|  { "f": "a
b"}|null|
|           null| abc|
+---------------+----+
{code}

Meaning this also does not work

with the JSON files as below:

{code}
{"f": "
d",  "f0": 3}
{code}


{code}
scala> spark.read.option("multiLine", true).json("tmp.json").show()
+--------------------+
|     _corrupt_record|
+--------------------+
|{"f": "
d",  "f0"...|
+--------------------+
{code}

Of course, the codes below work:

{code}
scala> spark.read.json(Seq("{ \"f\": \"ab\"}", "{ \"f\": \"abc\"}").toDS).show()
+---+
|  f|
+---+
| ab|
|abc|
+---+
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to