I have a table with a few columns, some of which are arrays. Since
upgrading from Spark 1.6 to Spark 2.0.1, the array fields are always null
when reading in a DataFrame.

When writing the Parquet files, the schema of the column is specified as

StructField("packageIds",ArrayType(StringType))

The schema of the column in the Hive Metastore is

packageIds array<string>

The schema used in the writer exactly matches the schema in the Metastore
in all ways (order, casing, types etc)

The query is a simple "select *"

spark.sql("select * from tablename limit 1").collect() // null columns in Row

How can I begin debugging this issue? Notable things I've already
investigated:

   - Files were written using Spark 1.6
   - DataFrame works in spark 1.5 and 1.6
   - I've inspected the parquet files using parquet-tools and can see the
   data.
   - I also have another table written in exactly the same way and it
   doesn't have the issue.

Reply via email to