Hello Hive users, I am experience a problem with MetaStore in Hive 3.0.
1. Start MetaStore with hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore. 2. Generate TPC-DS data. 3. TPC-DS queries run okay and produce correct results. E.g., from query 1: +-------------------+ | c_customer_id | +-------------------+ | AAAAAAAAAAAACHAA | | AAAAAAAAAAAADCAA | | AAAAAAAAAAAADDAA | ... | AAAAAAAAAAAILIAA | +-------------------+ 100 rows selected (69.901 seconds) However, the query compilation takes long ( https://issues.apache.org/jira/browse/HIVE-16520). 4. Now, restart MetaStore with hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore. 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1: +----------------+ | c_customer_id | +----------------+ +----------------+ No rows selected (37.448 seconds) What I noticed is that with hive.metastore.rawstore.impl=CachedStore, HiveServer2 produces such log messages: 2018-06-12T23:50:04,223 WARN [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year 2018-06-12T23:50:04,223 INFO [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year 2018-06-12T23:50:04,225 WARN [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk 2018-06-12T23:50:04,225 INFO [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk 2018-06-12T23:50:04,226 WARN [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, c_customer_id 2018-06-12T23:50:04,226 INFO [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, c_customer_id 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: Invalid column stats: No of nulls > cardinality 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: Invalid column stats: No of nulls > cardinality 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: Invalid column stats: No of nulls > cardinality However, even after computing column stats, queries still return wrong results, despite the fact that the above log messages disappear. I guess I am missing some configuration parameters (because I imported hive-site.xml from Hive 2). Any suggestion would be appreciated. Thanks a lot, --- Sungwoo Park