ey-chih chow created SPARK-22639:
------------------------------------

             Summary: no rowcount estimation returned if groupby clause 
involves substring
                 Key: SPARK-22639
                 URL: https://issues.apache.org/jira/browse/SPARK-22639
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.2.0
            Reporter: ey-chih chow


CBO can not estimate rowcount if the groupby clause of a query involves the 
expression substring.  For example, we can not estimate the row count of the 
following query, extracted from TPC-DS queries and based on the TPC-DS schema:

SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, 
sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`   
FROM  store_sales  INNER JOIN date_dim ON (date_dim.`d_date_sk` = 
store_sales.`ss_sold_date_sk`)  INNER JOIN item ON (store_sales.`ss_item_sk` = 
item.`i_item_sk`)  
GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, 
30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to