[ https://issues.apache.org/jira/browse/IMPALA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Qifan Chen resolved IMPALA-2658. -------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed > Extend the NDV function to accept a precision > --------------------------------------------- > > Key: IMPALA-2658 > URL: https://issues.apache.org/jira/browse/IMPALA-2658 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.2.4 > Reporter: Peter Ebert > Assignee: Qifan Chen > Priority: Minor > Labels: ramp-up > Fix For: Impala 4.0 > > Attachments: Comparison of HLL Memory usage, Query Duration and > Accuracy.jpg > > > Hyperloglog algorithm used by NDV defaults to a precision of 10. Being able > to set this precision would have two benefits: > # Lower precision sizes can speed up the performance, as a precision of 9 has > 1/2 the number of registers as 10 (exponential) and may be just as accurate > depending on expected cardinality. > # Higher precision can help with very large cardinalities (100 million to > billion range) and will typically provide more accurate data. Those who are > presenting estimates to end users will likely be willing to trade some > performance cost for more accuracy, while still out performing the naive > approach by a large margin. > Propose adding the overloaded function NDV(expression, int precision) > with accepted range between 18 and 4 inclusive. -- This message was sent by Atlassian Jira (v8.3.4#803005)