[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622712#comment-16622712 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 9:08 PM: -- Per the

[jira] [Created] (IMPALA-7602) Definition of NDV differs between planner and stats mechanism

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7602: --- Summary: Definition of NDV differs between planner and stats mechanism Key: IMPALA-7602 URL: https://issues.apache.org/jira/browse/IMPALA-7602 Project: IMPALA

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622712#comment-16622712 ] Paul Rogers commented on IMPALA-7310: - Per the suggestion of [~jeszyb], created IMPALA-7601 to

[jira] [Commented] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622709#comment-16622709 ] Paul Rogers commented on IMPALA-7601: - Please see [~tarmstr...@cloudera.com]'s comment in

[jira] [Created] (IMPALA-7601) Define a-priori selectivity and NDV values

2018-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-7601: --- Summary: Define a-priori selectivity and NDV values Key: IMPALA-7601 URL: https://issues.apache.org/jira/browse/IMPALA-7601 Project: IMPALA Issue Type:

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621434#comment-16621434 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 3:02 AM: -- Odd. Looked

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621434#comment-16621434 ] Paul Rogers commented on IMPALA-7310: - Odd. Looked at the tests in {{ExprNdvTest}}. We have tests

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621096#comment-16621096 ] Paul Rogers edited comment on IMPALA-7310 at 9/20/18 1:50 AM: -- Simplest

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621222#comment-16621222 ] Paul Rogers commented on IMPALA-7310: - [~jeszyb], agree completely. Here I'm digging down to do a

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621196#comment-16621196 ] Paul Rogers commented on IMPALA-7310: - Here, it is worth pointing out the risk of any change. The

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621171#comment-16621171 ] Paul Rogers commented on IMPALA-7310: - The original description pointed out the method that computes

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621126#comment-16621126 ] Paul Rogers commented on IMPALA-7310: - As noted above, the code uses -1 as an "undefined"

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621096#comment-16621096 ] Paul Rogers commented on IMPALA-7310: - Simplest case: a binary predicate. Current behavior in

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-19 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620946#comment-16620946 ] Paul Rogers commented on IMPALA-7560: - The table in DRILL-5254 suggests how to use the NDV value to

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619968#comment-16619968 ] Paul Rogers commented on IMPALA-7310: - Simple reproduction: {noformat} create table t1 (x int, y

[jira] [Assigned] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers reassigned IMPALA-7310: --- Assignee: Paul Rogers > Compute Stats not computing NULLs as a distinct value causing

[jira] [Work started] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7310 started by Paul Rogers. --- > Compute Stats not computing NULLs as a distinct value causing wrong estimates

[jira] [Comment Edited] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619827#comment-16619827 ] Paul Rogers edited comment on IMPALA-7560 at 9/19/18 1:34 AM: -- FWIW, it

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

2018-09-18 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619827#comment-16619827 ] Paul Rogers commented on IMPALA-7560: - Turns out that Apache Drill did a similar analysis to work

<    2   3   4   5   6   7