> I dug a little deeper and it appears that the configuration property >"columns.types", which is used >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(), > is not being set. When I manually set that property in hive, your >example works fine.
Good to know more about the NPE. ORC uses the exact same parameter. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java: columnTypeProperty = conf.get(serdeConstants.LIST_COLUMN_TYPES); But I think this could have a very simple explanation. Assuming you have a build of Tez, I would recommend adding a couple of LOG.warn lines in TezGroupedSplitsInputFormat public RecordReader<K, V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException { Particularly whether the "this.conf" or "job" conf object has the column.types set? My guess is that the set; command is setting that up in JobConf & the default compiler places it in the this.conf object. If that is the case, we can fix Parquet to pick it up off the right one. Cheers, Gopal