Re: [Spark2] Error writing "complex" type to CSV

Hyukjin Kwon Thu, 18 Aug 2016 17:54:29 -0700

Ah, sorry, I should have read this carefully. Do you mind if I ask your
codes to test?


I would like to reproduce.


I just tested this by myself but I couldn't reproduce as below (is this
what your doing, right?):

case class ClassData(a: String, b: Date)

val ds: Dataset[ClassData] = Seq(
  ("a", Date.valueOf("1990-12-13")),
  ("a", Date.valueOf("1990-12-13")),
  ("a", Date.valueOf("1990-12-13"))
).toDF("a", "b").as[ClassData]
ds.write.csv("/tmp/data.csv")
spark.read.csv("/tmp/data.csv").show()

prints as below:

+---+----+
|_c0| _c1|
+---+----+
|  a|7651|
|  a|7651|
|  a|7651|
+---+----+



2016-08-19 9:27 GMT+09:00 Efe Selcuk <efema...@gmail.com>:

> Thanks for the response. The problem with that thought is that I don't
> think I'm dealing with a complex nested type. It's just a dataset where
> every record is a case class with only simple types as fields, strings and
> dates. There's no nesting.
>
> That's what confuses me about how it's interpreting the schema. The schema
> seems to be one complex field rather than a bunch of simple fields.
>
> On Thu, Aug 18, 2016, 5:07 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> Hi Efe,
>>
>> If my understanding is correct, supporting to write/read complex types is
>> not supported because CSV format can't represent the nested types in its
>> own format.
>>
>> I guess supporting them in writing in external CSV is rather a bug.
>>
>> I think it'd be great if we can write and read back CSV in its own format
>> but I guess we can't.
>>
>> Thanks!
>>
>> On 19 Aug 2016 6:33 a.m., "Efe Selcuk" <efema...@gmail.com> wrote:
>>
>>> We have an application working in Spark 1.6. It uses the databricks csv
>>> library for the output format when writing out.
>>>
>>> I'm attempting an upgrade to Spark 2. When writing with both the native
>>> DataFrameWriter#csv() method and with first specifying the
>>> "com.databricks.spark.csv" format (I suspect underlying format is the same
>>> but I don't know how to verify), I get the following error:
>>>
>>> java.lang.UnsupportedOperationException: CSV data source does not
>>> support struct<[bunch of field names and types]> data type
>>>
>>> There are 20 fields, mostly plain strings with a couple of dates. The
>>> source object is a Dataset[T] where T is a case class with various fields
>>> The line just looks like: someDataset.write.csv(outputPath)
>>>
>>> Googling returned this fairly recent pull request: https://mail-
>>> archives.apache.org/mod_mbox/spark-commits/201605.mbox/%
>>> 3c65d35a72bd05483392857098a2635...@git.apache.org%3E
>>>
>>> If I'm reading that correctly, the schema shows that each record has one
>>> field of this complex struct type? And the validation thinks it's something
>>> that it can't serialize. I would expect the schema to have a bunch of
>>> fields in it matching the case class, so maybe there's something I'm
>>> misunderstanding.
>>>
>>> Efe
>>>
>>

Re: [Spark2] Error writing "complex" type to CSV

Reply via email to