chenliang created SPARK-33721: --------------------------------- Summary: Support to use Hive build-in functions by configuration Key: SPARK-33721 URL: https://issues.apache.org/jira/browse/SPARK-33721 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3, 3.2.0 Reporter: chenliang
Hive and Spark SQL engines have many differences in built-in functions.The differences between several functions are shown below: ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, }}{{' 24:00:00'}}{{));}}|1591027200| NULL| |to_date|{{select}} {{to_date(}}{{'0000-00-00'}}{{);}}|0002-11-30| NULL| |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, }}{{'0000-00-00'}}{{);}}|737986| NULL| |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, c2;}} {{bigdata_offline.test_collect_set contains data:}} {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} {{1 1 3##1##2##5##4}}| There is no conclusion on which engine is more accurate. Users prefer to be able to make choices according to their real production environment. I think we should do some improvement for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org