[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-11 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20163 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apac

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20163 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.o

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 One more SGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h..

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20163 SGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apa

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-10 Thread rednaxelafx
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20163 Given the above discussion, do we have consensus on all of the following: - Update the documentation for PySpark UDFs to warn about the behavior of mismatched declared `returnType` vs actual

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 Probably we consider to catch and set nulls in pandas_udf if possible to match the behaviour with udf ... --- - To unsubscr

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-09 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20163 I investigated the behavior differences between `udf` and `pandas_udf` for the wrong return types and found there are many differences actually. Basically `udf`s return `null` as @HyukjinKwon ment

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20163 The current behavior looks weird, we should either throw exception and ask users to give a corrected return type or fix it via proposal 2. --- ---

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 @cloud-fan, actually I have the similar question too - https://github.com/apache/spark/pull/20163#discussion_r160017637. I tend to agree with it and I think we disallow this and document this.

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 @ueshin @icexelloss @cloud-fan @rednaxelafx, which one would you prefer? To me, I like 1 at most. If the perf diff is trivial, 2. is also fine. If 3. works fine, I think I am also fine w

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 Hey @rednaxelafx that's fine. We all make mistake and I usually think it's always better then not trying. I also made a mistake at the first time. It was easier to debug this with your comments

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-05 Thread rednaxelafx
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20163 Thanks for all of your comments, @HyukjinKwon and @icexelloss ! I'd like to wait for more discussions / suggestions on whether or not we want a behavior change that makes this reproducer work

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-05 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20163 I ran some experiments: ``` py_date = udf(datetime.date, DateType()) py_timestamp = udf(datetime.datetime, TimestampType()) ``` This works correctly ``` spark.range(1).s

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 The problem here seems, `returnType` is mismatched to the value. In case of `DateType`, it needs an explicit conversion into integers: https://github.com/apache/spark/blob/1c9f95cb771ac

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85709/ Test PASSed. ---

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20163 **[Test build #85709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85709/testReport)** for PR 20163 at commit [`ca026d3`](https://github.com/apache/spark/commit/c

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 Wait .. Isn't this because we failed to call `toInternal` by the return type? Please give me few days .. will double check tonight. ---

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20163 I think Scalar and Group map UDF expect pandas Series of datetime64[ns] (native pandas timestamp type) instead of a pandas Series of datetime.date and datetime.datetime object. I don't think it's

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20163 LGTM, cc @ueshin @icexelloss does this behavior consistent with pandas UDF? --- - To unsubscribe, e-mail: reviews-unsubscr...@

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20163 **[Test build #85709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85709/testReport)** for PR 20163 at commit [`ca026d3`](https://github.com/apache/spark/commit/ca