Hello Hive users,

I am experience a problem with MetaStore in Hive 3.0.

1. Start MetaStore
with hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.

2. Generate TPC-DS data.

3. TPC-DS queries run okay and produce correct results. E.g., from query 1:
+-------------------+
|   c_customer_id   |
+-------------------+
| AAAAAAAAAAAACHAA  |
| AAAAAAAAAAAADCAA  |
| AAAAAAAAAAAADDAA  |
...
| AAAAAAAAAAAILIAA  |
+-------------------+
100 rows selected (69.901 seconds)

However, the query compilation takes long (
https://issues.apache.org/jira/browse/HIVE-16520).

4. Now, restart MetaStore with
hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.

5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
+----------------+
| c_customer_id  |
+----------------+
+----------------+
No rows selected (37.448 seconds)

What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
HiveServer2 produces such log messages:

2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
c_customer_id
2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
c_customer_id

2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
Invalid column stats: No of nulls > cardinality
2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
Invalid column stats: No of nulls > cardinality
2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
Invalid column stats: No of nulls > cardinality

However, even after computing column stats, queries still return wrong
results, despite the fact that the above log messages disappear.

I guess I am missing some configuration parameters (because I imported
hive-site.xml from Hive 2). Any suggestion would be appreciated.

Thanks a lot,

--- Sungwoo Park

Reply via email to