GitHub user sujith71955 opened a pull request:

    https://github.com/apache/spark/pull/23197

    [SPARK-26165][Optimizer] Date and Timestamp column expressions are getting 
converted to string type

    Date and Timestamp column expressions are getting converted to string type 
in less than/greater than filter query even though valid date/timestamp string 
literal is used in the right side of filter expression
    eg: select to_date('2009-07-30 04:17:52') >= '2009-07-30 04:17:52'
    Internally the expression will be casted as below
    struct<(CAST(to_date('2009-07-30 04:17:52') AS STRING) >= 2009-07-30 
04:17:52):boolean>
    This can also reduce the query performance as every single row will be 
casted to String type
    
    After fix the above expression will be optimized as below, as the right 
side literal value is a valid date string literal.
    struct<(to_date('2009-07-30 04:17:52') >= CAST(2009-07-30 04:17:52 AS 
DATE)):boolean>
    
    ## What changes were proposed in this pull request?
    Date and Timestamp column is getting converted to string in less 
than/greater than filter queries even though date strings that contains a time, 
like '2018-03-18" 12:39:40'  which are valid date format string.
    
    I think we shall avoid casting to String type if Date/timestamp string 
literal value can be converted to a valid date or timestamp,  and we shall 
convert the filter right expression column to string type only if  filter 
expression with string literal cannot be converted to data/timestamp.
    
    
    ## How was this patch tested?
    Using Existing Unit testcase and manual testing.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sujith71955/spark master_filter_perf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23197.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23197
    
----
commit 9f1b2493d92c4340d02d4958da38d9ff8c1ab612
Author: s71955 <sujithchacko.2010@...>
Date:   2018-11-25T14:14:28Z

    [SPARK-26165][Optimizer] Date and Timestamp column is getting converted to 
string in less than/greater than filter query even though valid date/timestamp 
string literal is used in the right side filter expression
    
    ## What changes were proposed in this pull request?
    Date and Timestamp column is getting converted to string in less 
than/greater than filter query even though date strings that contains a time, 
like '2018-03-18" 12:39:40' to date.
    Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a 
timestamp.
    I think we shall avoid casting if Date/timestamp string which can be 
converted to a valid date or timestamp ,we can convert the filter right 
expression column to sting type only if  filter expression with string literal 
cannot be converted to
    data/timestamp.
    
    ## How was this patch tested?
    Using Existing Unit testcase and manual testing.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to