[jira] [Updated] (SPARK-32611) Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate

Sumeet (Jira) Thu, 13 Aug 2020 14:05:57 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sumeet updated SPARK-32611:
---------------------------
    Description: 
*How to reproduce this behavior?*
 * TZ="America/Los_Angeles" ./bin/spark-shell
 * sql("set spark.sql.hive.convertMetastoreOrc=true")
 * sql("set spark.sql.orc.impl=hive")
 * sql("create table t_spark(col timestamp) stored as orc;")
 * sql("insert into t_spark values (cast('2100-01-01 
01:33:33.123America/Los_Angeles' as timestamp));")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return empty results, which is incorrect.*
 * sql("set spark.sql.orc.impl=native")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return 1 row, which is the expected output.*

 

The above query using (True, hive) returns *correct results if pushdown filters 
are turned off*. 
 * sql("set spark.sql.orc.filterPushdown=false")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return 1 row, which is the expected output.*

  was:
*How to reproduce this behavior?*
 * TZ="America/Los_Angeles" ./bin/spark-shell --conf 
spark.sql.catalogImplementation=hive
 * sql("set spark.sql.hive.convertMetastoreOrc=true")
 * sql("set spark.sql.orc.impl=hive")
 * sql("create table t_spark(col timestamp) stored as orc;")
 * sql("insert into t_spark values (cast('2100-01-01 
01:33:33.123America/Los_Angeles' as timestamp));")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return empty results, which is incorrect.*
 * sql("set spark.sql.orc.impl=native")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return 1 row, which is the expected output.*

 

The above query using (True, hive) returns *correct results if pushdown filters 
are turned off*. 
 * sql("set spark.sql.orc.filterPushdown=false")
 * sql("select col, date_format(col, 'DD') from t_spark where col = 
cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
 *This will return 1 row, which is the expected output.*


> Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect 
> when timestamp is present in predicate
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32611
>                 URL: https://issues.apache.org/jira/browse/SPARK-32611
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.0.1
>            Reporter: Sumeet
>            Priority: Major
>
> *How to reproduce this behavior?*
>  * TZ="America/Los_Angeles" ./bin/spark-shell
>  * sql("set spark.sql.hive.convertMetastoreOrc=true")
>  * sql("set spark.sql.orc.impl=hive")
>  * sql("create table t_spark(col timestamp) stored as orc;")
>  * sql("insert into t_spark values (cast('2100-01-01 
> 01:33:33.123America/Los_Angeles' as timestamp));")
>  * sql("select col, date_format(col, 'DD') from t_spark where col = 
> cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
>  *This will return empty results, which is incorrect.*
>  * sql("set spark.sql.orc.impl=native")
>  * sql("select col, date_format(col, 'DD') from t_spark where col = 
> cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
>  *This will return 1 row, which is the expected output.*
>  
> The above query using (True, hive) returns *correct results if pushdown 
> filters are turned off*. 
>  * sql("set spark.sql.orc.filterPushdown=false")
>  * sql("select col, date_format(col, 'DD') from t_spark where col = 
> cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
>  *This will return 1 row, which is the expected output.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32611) Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate

Reply via email to