Correctness Issue for UDT Support in PySpark

2021-04-24 Thread Darcy Shen
There is a correctness in the following code snippet. (https://issues.apache.org/jira/browse/SPARK-35211) ``` spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") from pyspark.testing.sqlutils import ExamplePoint import pandas as pd pdf = pd.DataFrame({'point':

Re: Scala 2.11 support removed for Spark 3.0.0

2019-03-25 Thread Darcy Shen
Cool, Scala 2.12 compiles faster than Scala 2.11 . But it runs slower than Scala 2.11 by default. We may enable some compiler optimization options. On Mon, 25 Mar 2019 23:53:18 +0800 Sean Owen wrote I merged https://github.com/apache/spark/pull/23098

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-19 Thread Darcy Shen
-1 please backpoart SPARK-27160, a correctness issue about ORC native reader. see https://github.com/apache/spark/pull/24092 On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai wrote Please vote on releasing the following candidate as Apache Spark version 2.4.1. The vote is

Re: Compatibility on build-in DateTime functions with Hive/Presto

2019-02-17 Thread Darcy Shen
s silent. Is this actually defined behavior in a SQL standard, or, what does MySQL do? On Fri, Feb 15, 2019 at 2:07 AM Darcy Shen <mailto:sad...@zoho.com.invalid> wrote: > > See https://issues.apache.org/jira/browse/SPARK-26885 and > https://github.com/apache/spark/blob/71

Compatibility on build-in DateTime functions with Hive/Presto

2019-02-15 Thread Darcy Shen
See https://issues.apache.org/jira/browse/SPARK-26885 and  https://github.com/apache/spark/blob/71170e74df5c7ec657f61154212d1dc2ba7d0613/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala stringToTimestamp, stringToDate support , as a result: select

Re: Time to cut an Apache 2.4.1 release?

2019-02-14 Thread Darcy Shen
upgrading Hive. On Thu, 14 Feb 2019 21:36:08 +0800 Wenchen Fan wrote Do you know which bug ORC 1.5.2 introduced? Or is it because Hive uses a legacy version of ORC which has a bug? On Thu, Feb 14, 2019 at 2:35 PM Darcy Shen <mailto:sad...@zoho.com.invalid> wrote:

Re: Time to cut an Apache 2.4.1 release?

2019-02-13 Thread Darcy Shen
We found that ORC table created by Spark 2.4 failed to be read by Hive 2.1.1. spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;' hive -e 'select * from tmp.orcTable2' The ERROR messages by Hive: Failed with exception

Re: removing most of the config functions in SQLConf?

2018-12-14 Thread Darcy Shen
I agree with the CatalystConf idea. On Fri, 14 Dec 2018 18:40:26 +0800 Wenchen Fan wrote IIRC, the reason we did it is: `SQLConf` was in SQL core module. So we need to create methods in `CatalystConf`, and `SQLConf` implements `CatalystConf`.Now the

Why not setup a Gitter chatroom for Spark contributors

2018-12-09 Thread Darcy Shen
Gitter is cool and convenient. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-22 Thread Darcy Shen
+1 On Tue, 23 Oct 2018 01:42:06 +0800 Wenchen Fan wrote Please vote on releasing the following candidate as Apache Spark version 2.4.0.The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, witha minimum of 3 +1 votes.[ ]

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Darcy Shen
with scala 2.12.6, right?On Fri, Sep 28, 2018 at 4:22 PM Darcy Shen wrote:-1 see:https://github.com/apache/spark/pull/22577We should make sure that Spark works with Scala 2.12.7 .https://github.com/scala/bug/issues/11123This resolved bug of Scala 2.12.6 is severe and related to correctness.We should

Upgrade SBT to the latest

2018-08-31 Thread Darcy Shen
SBT 1.x is ready for a long time. We may spare some time upgrading sbt for Spark. An unbrella JIRA like Scala 2.12 should be created.