[ https://issues.apache.org/jira/browse/HIVE-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez reassigned HIVE-22239: ---------------------------------------------- > Scale data size using column value ranges > ----------------------------------------- > > Key: HIVE-22239 > URL: https://issues.apache.org/jira/browse/HIVE-22239 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Major > > Currently, min/max values for columns are only used to determine whether a > certain range filter falls out of range and thus filters all rows or none at > all. If it does not, we just use a heuristic that the condition will filter > 1/3 of the input rows. Instead of using that heuristic, we can use another > one that assumes that data will be uniformly distributed across that range, > and calculate the selectivity for the condition accordingly. > This patch also includes the propagation of min/max column values from > statistics to the optimizer for timestamp type. -- This message was sent by Atlassian Jira (v8.3.4#803005)