Hi!

I create DataFrame using method following
JavaRDD<Row> rows = ...
StructType structType = ...
Then apply sqlContext.createDataFrame(rows, structType).

I have pretty complex schema:
root
 |-- Id: long (nullable = true)
 |-- attributes: struct (nullable = true)
 |    |-- FirstName: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- Identifiers: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- Type: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)

The question is when I explode attributes.Identifiers column there is one more 
field appear in the schema:
|-- Identifiers: string (nullable = true)

The question is: why the type of Identifiers is string? Is it possible to make 
it nonString?
In the given example it’s clear that the schema must be a 
struct<array<string>>. And unfortunately it’s not possible to cast this column 
as cast string to struct is not allowed.

Are there any workarounds to have correct schema?
Thanks in advance.

Eugene Morozov
[email protected]




Reply via email to