shin chen created HIVE-21982: -------------------------------- Summary: hive does not use stats even after analyzing the table Key: HIVE-21982 URL: https://issues.apache.org/jira/browse/HIVE-21982 Project: Hive Issue Type: Bug Components: Hive Environment: HDP
Hive 1.2.1000.2.6.5.0-292 Reporter: shin chen setting: {code:java} hive.cbo.enable=true; hive.compute.query.using.stats=true; hive.stats.fetch.column.stats=true; hive.stats.fetch.partition.stats=true; hive.vectorized.execution.enabled =true; hive.vectorized.execution.reduce.enabled = true; {code} {code:java} // desc extended **.** partition(month=**,day=**,hour=**); ..... parameters:{transient_lastDdlTime=1561958282, totalSize=16413917810, numFiles=3} {code} This table is not analyzed yet, so scan the table when a simple query executed. {code:java} // code placeholder SELECT count(*) FROM **.** WHERE month='**' AND day='**' AND hour='**'; .... 1 row selected (52.756 seconds){code} After analyzing the table {code:java} // Analyze first analyze table **.** partition(month='**',day='**',hour='**') compute statistics; // Then runs the last count(*) query SELECT count(*) FROM **.** WHERE month='**' AND day='**' AND hour='**'; .... 1 row selected (58.326 seconds){code} Hive does not use the metadata in stats Describe the table again: {code:java} .... parameters:{totalSize=16413917811, numRows=37975264, rawDataSize=4670957472, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=3, transient_lastDdlTime=1562669873}) {code} Any advice here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)