> I dug a little deeper and it appears that the configuration property
>"columns.types", which is used
>org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(),
> is not being set. When I manually set that property in hive, your
>example works fine.

Good to know more about the NPE. ORC uses the exact same parameter.

ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:
columnTypeProperty = conf.get(serdeConstants.LIST_COLUMN_TYPES);

But I think this could have a very simple explanation.

Assuming you have a build of Tez, I would recommend adding a couple of
LOG.warn lines in TezGroupedSplitsInputFormat

public RecordReader<K, V> getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {


Particularly whether the "this.conf" or "job" conf object has the
column.types set?

My guess is that the set; command is setting that up in JobConf & the
default compiler places it in the this.conf object.

If that is the case, we can fix Parquet to pick it up off the right one.

Cheers,
Gopal











Reply via email to