Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20163
+1
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apac
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20163
+1
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.o
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
One more SGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h..
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20163
SGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apa
Github user rednaxelafx commented on the issue:
https://github.com/apache/spark/pull/20163
Given the above discussion, do we have consensus on all of the following:
- Update the documentation for PySpark UDFs to warn about the behavior of
mismatched declared `returnType` vs actual
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
Probably we consider to catch and set nulls in pandas_udf if possible to
match the behaviour with udf ...
---
-
To unsubscr
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20163
I investigated the behavior differences between `udf` and `pandas_udf` for
the wrong return types and found there are many differences actually.
Basically `udf`s return `null` as @HyukjinKwon ment
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20163
The current behavior looks weird, we should either throw exception and ask
users to give a corrected return type or fix it via proposal 2.
---
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
@cloud-fan, actually I have the similar question too -
https://github.com/apache/spark/pull/20163#discussion_r160017637. I tend to
agree with it and I think we disallow this and document this.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
@ueshin @icexelloss @cloud-fan @rednaxelafx, which one would you prefer?
To me, I like 1 at most. If the perf diff is trivial, 2. is also fine. If
3. works fine, I think I am also fine w
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
Hey @rednaxelafx that's fine. We all make mistake and I usually think it's
always better then not trying. I also made a mistake at the first time. It was
easier to debug this with your comments
Github user rednaxelafx commented on the issue:
https://github.com/apache/spark/pull/20163
Thanks for all of your comments, @HyukjinKwon and @icexelloss !
I'd like to wait for more discussions / suggestions on whether or not we
want a behavior change that makes this reproducer work
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20163
I ran some experiments:
```
py_date = udf(datetime.date, DateType())
py_timestamp = udf(datetime.datetime, TimestampType())
```
This works correctly
```
spark.range(1).s
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
The problem here seems, `returnType` is mismatched to the value. In case of
`DateType`, it needs an explicit conversion into integers:
https://github.com/apache/spark/blob/1c9f95cb771ac
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20163
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85709/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20163
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20163
**[Test build #85709 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85709/testReport)**
for PR 20163 at commit
[`ca026d3`](https://github.com/apache/spark/commit/c
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20163
Wait .. Isn't this because we failed to call `toInternal` by the return
type? Please give me few days .. will double check tonight.
---
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20163
I think Scalar and Group map UDF expect pandas Series of datetime64[ns]
(native pandas timestamp type) instead of a pandas Series of datetime.date and
datetime.datetime object. I don't think it's
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20163
LGTM, cc @ueshin @icexelloss does this behavior consistent with pandas UDF?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20163
**[Test build #85709 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85709/testReport)**
for PR 20163 at commit
[`ca026d3`](https://github.com/apache/spark/commit/ca
21 matches
Mail list logo