HI Gopal, With the release of 0.8.2, I thought I would give tez another shot. Unfortunately, I got the same NPE. I dug a little deeper and it appears that the configuration property "columns.types", which is used org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(), is not being set. When I manually set that property in hive, your example works fine.
hive> create temporary table x (x int) stored as parquet; hive> insert into x values(1),(2); hive> set columns.type=int; hive> select count(*) from x where x.x > 1; OK 1 I also saw that the configuration parameter parquet.columns.index.access is also checked in that same function. Setting that property to "true" fixes my issue. hive> create temporary table x (x int) stored as parquet; hive> insert into x values(1),(2); hive> set parquet.column.index.access=true; hive> select count(*) from x where x.x > 1; OK 1 Thanks for your help. Best, Adam On Tue, Jan 5, 2016 at 9:10 AM, Adam Hunt <adamph...@gmail.com> wrote: > Hi Gopal, > > Spark does offer dynamic allocation, but it doesn't always work as > advertised. My experience with Tez has been more in line with my > expectations. I'll bring up my issues with Spark on that list. > > I tried your example and got the same NPE. It might be a mapr-hive issue. > Thanks for your help. > > Adam > > On Mon, Jan 4, 2016 at 12:58 PM, Gopal Vijayaraghavan <gop...@apache.org> > wrote: > >> >> > select count(*) from alexa_parquet; >> >> > Caused by: java.lang.NullPointerException >> > at >> >> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.tokeni >> >ze(TypeInfoUtils.java:274) >> > at >> >> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.<init> >> >(TypeInfoUtils.java:293) >> > at >> >> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeS >> >tring(TypeInfoUtils.java:764) >> > at >> >> >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getColum >> >nTypes(DataWritableReadSupport.java:76) >> > at >> >> >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Dat >> >aWritableReadSupport.java:220) >> > at >> >> >org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSp >> >lit(ParquetRecordReaderWrapper.java:256) >> >> This might be an NPE triggered off by a specific case of the type parser. >> >> I tested it out on my current build with simple types and it looks like >> the issue needs more detail on the column types for a repro. >> >> hive> create temporary table x (x int) stored as parquet; >> hive> insert into x values(1),(2); >> hive> select count(*) from x where x.x > 1; >> Status: DAG finished successfully in 0.18 seconds >> OK >> 1 >> Time taken: 0.792 seconds, Fetched: 1 row(s) >> hive> >> >> Do you have INT96 in the schema? >> >> > I'm currently evaluating Hive on Tez as an alternative to keeping the >> >SparkSQL thrift sever running all the time locking up resources. >> >> Tez has a tunable value in tez.am.session.min.held-containers (i.e >> something small like 10). >> >> And HiveServer2 can be made work similarly because spark >> HiveThriftServer2.scala is a wrapper around hive's ThriftBinaryCLIService. >> >> >> >> >> >> >> Cheers, >> Gopal >> >> >> >