[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Thank you @gatorsmile. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 Cleanup can be done in a separate PR. I am still in a personal trip. This weekend, I will update the PR https://github.com/apache/spark/pull/20171 based on the current discussion.

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Sounds good. Then, how about we add this UDF support into `registerFunction` here, and we do the clean up / deprecation / moving things seperately? If you guys wouldn't mind, I would like to

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20217 > Like other people mentioned before, it's really confusing to have so many ways to register a UDF in PySpark, while Java/Scala API is cleaner. I agree with it. When I looked at this change,

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20217 Both are fine to me, and seems reusing `registerFunction` is more Python style(correct me if I was wrong). My suggestion is to put the UDF registration interface in `UDFRegistration`,

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-12 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20217 @HyukjinKwon you convinced me. Let's try not to add new APIs that is not in Java/Scala. I also agree with @cloud-fan . --- -

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 @cloud-fan, do you prefer to have a new API just to be clear, BTW? --- - To unsubscribe, e-mail:

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20217 To be consistent with Java/Scala API, I think we should only add this new API to `UDFRegistration`. We should also move the `registerFunction` to there too, and deprecate it in `Catalog`.

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Yea, @gatorsmile's https://github.com/apache/spark/pull/20217#issuecomment-357131129 this was exactly what was on my mind. I have few arguments to

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 Deprecating `registerFunction(name, f, returnType)` is not acceptable in Spark 2.x releases. They still call `registerFunction(name, f)` when `f` is a UDF based on my above comment.

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #86011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86011/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86011/ Test PASSed. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 Another way is to change the default value of `returnType` of `registerFunction` to None. To avoid the behavior change, we can set `returnType` for Python functions to `StringType` internally.

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20217 My two cents: I kind of like the PySpark API `registerUDF`. I think the API is simple and clear, comparing to the alternative (merging this API to `registerFunction` and throwing an

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Also, I am fine with having a discussion later within 2.3.0 timeline and doing that in a followup if needed. --- - To

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Just wanted to make sure and check other possibilities (which is usually not quite good to check so late like this ... ). I am not against this implementation and fine as is too. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20217 Yup, I am looking at it. `def register(name: String, udf: UserDefinedFunction)` and `def register[RT: TypeTag](name: String, func: Function0[RT])`, etc. The usual way to resemble

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #86011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86011/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 `registerUDF` is more consistent with the Scala APIs defined in `UDFRegistration`. We do not need to provide `returnType`. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 Any more comment? @icexelloss @HyukjinKwon @ueshin @viirya --- - To unsubscribe, e-mail:

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85971/ Test PASSed. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85971/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85971/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85964/ Test PASSed. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85964/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85964/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20217 cc @cloud-fan @ueshin @icexelloss @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85912/ Test PASSed. ---

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85912/testReport)** for PR 20217 at commit

[GitHub] spark issue #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20217 **[Test build #85912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85912/testReport)** for PR 20217 at commit