Re: NPE when reading Parquet using Hive on Tez

Gopal Vijayaraghavan Tue, 02 Feb 2016 15:23:11 -0800

> I dug a little deeper and it appears that the configuration property
>"columns.types", which is used
>org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(),
> is not being set. When I manually set that property in hive, your
>example works fine.


Good to know more about the NPE. ORC uses the exact same parameter.

ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:
columnTypeProperty = conf.get(serdeConstants.LIST_COLUMN_TYPES);

But I think this could have a very simple explanation.

Assuming you have a build of Tez, I would recommend adding a couple of
LOG.warn lines in TezGroupedSplitsInputFormat

public RecordReader<K, V> getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {


Particularly whether the "this.conf" or "job" conf object has the
column.types set?

My guess is that the set; command is setting that up in JobConf & the
default compiler places it in the this.conf object.

If that is the case, we can fix Parquet to pick it up off the right one.

Cheers,
Gopal

Re: NPE when reading Parquet using Hive on Tez

Reply via email to