Chun Chang created DRILL-3062:
---------------------------------

             Summary: regression: Mondrian query447.q - lots of rows missing in 
result set
                 Key: DRILL-3062
                 URL: https://issues.apache.org/jira/browse/DRILL-3062
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Data Types
    Affects Versions: 1.0.0
            Reporter: Chun Chang
            Assignee: Daniel Barclay (Drill)
            Priority: Blocker


{code}
0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select * from sys.version;
+------------+----------------+-------------+-------------+------------+
| commit_id  | commit_message | commit_time | build_email | build_time |
+------------+----------------+-------------+-------------+------------+
| d1526f9462f6817a76631464ff332bb99b3bdf28 | DRILL-2750: Running 1 or more 
queries against Drillbits having insufficient DirectMem renders the Drillbits 
in an unusable state | 13.05.2015 @ 08:47:20 EDT | Unknown     | 13.05.2015 @ 
10:44:43 EDT |
+------------+----------------+-------------+-------------+------------+
{code}

Many (total of 42) mondrian queries regressed. All of them missing rows in the 
returned result set.

Here is an example, query447.q

{code}
SELECT time_by_day.the_year            AS c0, 
       product_class.product_family    AS c1, 
       customer.state_province         AS c2, 
       customer.city                   AS c3, 
       Sum(sales_fact_1997.unit_sales) AS m0 
FROM   time_by_day AS time_by_day, 
       sales_fact_1997 AS sales_fact_1997, 
       product_class AS product_class, 
       product AS product, 
       customer AS customer 
WHERE  sales_fact_1997.time_id = time_by_day.time_id 
       AND time_by_day.the_year = 1997 
       AND sales_fact_1997.product_id = product.product_id 
       AND product.product_class_id = product_class.product_class_id 
       AND product_class.product_family = 'Drink' 
       AND sales_fact_1997.customer_id = customer.customer_id 
       AND customer.state_province = 'WA' 
       AND customer.city IN ( 'Anacortes', 'Ballard', 'Bellingham', 
'Bremerton', 
                              'Burien', 'Edmonds', 'Everett', 'Issaquah', 
                              'Kirkland', 'Lynnwood', 'Marysville', 'Olympia', 
                              'Port Orchard', 'Puyallup', 'Redmond', 'Renton', 
                              'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 
                              'Walla Walla', 'Yakima' ) 
GROUP  BY time_by_day.the_year, 
          product_class.product_family, 
          customer.state_province, 
          customer.city; 
{code}

This query should return the following result:

{code}
[root@qa-node120 mondrian]# cat query447.e
1997    Drink   WA      Walla Walla     191.0000
1997    Drink   WA      Issaquah        203.0000
1997    Drink   WA      Everett 208.0000
1997    Drink   WA      Olympia 1066.0000
1997    Drink   WA      Edmonds 166.0000
1997    Drink   WA      Bremerton       1160.0000
1997    Drink   WA      Renton  225.0000
1997    Drink   WA      Bellingham      68.0000
1997    Drink   WA      Ballard 214.0000
1997    Drink   WA      Burien  251.0000
1997    Drink   WA      Seattle 168.0000
1997    Drink   WA      Redmond 137.0000
1997    Drink   WA      Lynnwood        201.0000
1997    Drink   WA      Puyallup        1040.0000
1997    Drink   WA      Tacoma  986.0000
1997    Drink   WA      Kirkland        247.0000
1997    Drink   WA      Sedro Woolley   58.0000
1997    Drink   WA      Yakima  1159.0000
1997    Drink   WA      Port Orchard    1128.0000
1997    Drink   WA      Spokane 2238.0000
1997    Drink   WA      Anacortes       82.0000
1997    Drink   WA      Marysville      193.0000
{code}

But drill now returns:

{code}
1997    Drink   WA      Sedro Woolley   58.0000
{code}

Here is the plan:

{code}
0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select 
time_by_day.the_year as c0, product_class.product_family as c1, 
customer.state_province as c2, customer.city as c3, 
sum(sales_fact_1997.unit_sales) as m0 from time_by_day as time_by_day, 
sales_fact_1997 as sales_fact_1997, product_class as product_class, product as 
product, customer as customer where sales_fact_1997.time_id = 
time_by_day.time_id and time_by_day.the_year = 1997 and 
sales_fact_1997.product_id = product.product_id and product.product_class_id = 
product_class.product_class_id and product_class.product_family = 'Drink' and 
sales_fact_1997.customer_id = customer.customer_id and customer.state_province 
= 'WA' and customer.city in ('Anacortes', 'Ballard', 'Bellingham', 'Bremerton', 
'Burien', 'Edmonds', 'Everett', 'Issaquah', 'Kirkland', 'Lynnwood', 
'Marysville', 'Olympia', 'Port Orchard', 'Puyallup', 'Redmond', 'Renton', 
'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 'Walla Walla', 'Yakima') group 
by time_by_day.the_year, product_class.product_family, customer.state_province, 
customer.city;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(c0=[$0], c1=[$1], c2=[$2], c3=[$3], m0=[$4])
00-02        HashAgg(group=[{0, 1, 2, 3}], m0=[SUM($4)])
00-03          Project(c0=[$0], c1=[$2], c2=[$3], c3=[$4], unit_sales=[$1])
00-04            HashJoin(condition=[=($5, $6)], joinType=[inner])
00-06              Project(the_year=[$0], unit_sales=[$5], product_family=[$6], 
state_province=[$10], city=[$11], f17=[$11])
00-08                Project(the_year=[$4], time_id=[$5], time_id0=[$0], 
product_id=[$1], customer_id=[$2], unit_sales=[$3], product_family=[$8], 
product_class_id=[$9], product_id0=[$6], product_class_id0=[$7], 
state_province=[$10], city=[$11], customer_id0=[$12])
00-09                  HashJoin(condition=[=($2, $12)], joinType=[inner])
00-11                    HashJoin(condition=[=($1, $6)], joinType=[inner])
00-14                      HashJoin(condition=[=($0, $5)], joinType=[inner])
00-18                        Project(time_id=[$2], product_id=[$0], 
customer_id=[$1], unit_sales=[$3])
00-23                          Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/mondrian/sales_fact_1997]], 
selectionRoot=/drill/testdata/mondrian/sales_fact_1997, numFiles=1, 
columns=[`time_id`, `product_id`, `customer_id`, `unit_sales`]]])
00-17                        Project(the_year=[$0], time_id0=[$1])
00-22                          SelectionVectorRemover
00-26                            Filter(condition=[=($0, 1997)])
00-28                              Project(the_year=[$1], time_id=[$0])
00-30                                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/mondrian/time_by_day]], 
selectionRoot=/drill/testdata/mondrian/time_by_day, numFiles=1, 
columns=[`the_year`, `time_id`]]])
00-13                      Project(product_id0=[$0], product_class_id=[$1], 
product_family=[$2], product_class_id0=[$3])
00-16                        HashJoin(condition=[=($1, $3)], joinType=[inner])
00-21                          Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/product]], 
selectionRoot=/drill/testdata/mondrian/product, numFiles=1, 
columns=[`product_id`, `product_class_id`]]])
00-20                          Project(product_family=[$0], 
product_class_id0=[$1])
00-25                            SelectionVectorRemover
00-27                              Filter(condition=[=($0, 'Drink')])
00-29                                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/mondrian/product_class]], 
selectionRoot=/drill/testdata/mondrian/product_class, numFiles=1, 
columns=[`product_family`, `product_class_id`]]])
00-10                    Project(state_province=[$0], city=[$1], 
customer_id0=[$2])
00-12                      SelectionVectorRemover
00-15                        Filter(condition=[=($0, 'WA')])
00-19                          Project(state_province=[$1], city=[$2], 
customer_id=[$0])
00-24                            Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/customer]], 
selectionRoot=/drill/testdata/mondrian/customer, numFiles=1, 
columns=[`state_province`, `city`, `customer_id`]]])
00-05              HashAgg(group=[{0}])
00-07                Values
 | {
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to