I have a table with a few columns, some of which are arrays. Since upgrading from Spark 1.6 to Spark 2.0.1, the array fields are always null when reading in a DataFrame.
When writing the Parquet files, the schema of the column is specified as StructField("packageIds",ArrayType(StringType)) The schema of the column in the Hive Metastore is packageIds array<string> The schema used in the writer exactly matches the schema in the Metastore in all ways (order, casing, types etc) The query is a simple "select *" spark.sql("select * from tablename limit 1").collect() // null columns in Row How can I begin debugging this issue? Notable things I've already investigated: - Files were written using Spark 1.6 - DataFrame works in spark 1.5 and 1.6 - I've inspected the parquet files using parquet-tools and can see the data. - I also have another table written in exactly the same way and it doesn't have the issue.