Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19860
Thanks for your work! A late LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19860
LGTM, merging to master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: re
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19860
@kiszk @viirya I made the following performance test:
```
val a = (1 to 10).map(x => 1).toDS
val filtered = a.where($"value".isin((1 to 10): _*))
(1 to 20).map(x=>time(fi
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19860
I am also interested in how much this PR can improve performance.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.or
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19860
OK. I see the intention here now. I'm not sure if it does considerable
impact, especially smaller functions will be inlined IIUC.
If it has impact not ignoring, it should be worth doing.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19860
@viirya sorry, I don't understand your question.
In Coalesce, we need to find the first non-null element. As soon as we find
one, we don't need to evaluate anything else. Previously, the code ge
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19860
I'm not sure if I currently follow this. For example, Coalesce, doesn't
guarantee the later functions won't be called by the conditions of
ev.isNull? Why we need to apply this do loop?
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19860
@kiszk of course it depends on each specific case, on average after this PR
we use only 50% of the function calls. Thus on average the overhead caused by
the many function calls is reduced by 50%.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19860
> this is a not negligible overhead which can be avoided.
How much can this PR improve this overhead?
---
-
To unsubscribe
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19860
cc @gatorsmile @kiszk @cloud-fan @viirya
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comm
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19860
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19860
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84376/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19860
**[Test build #84376 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84376/testReport)**
for PR 19860 at commit
[`ce74fb8`](https://github.com/apache/spark/commit/c
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19860
**[Test build #84376 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84376/testReport)**
for PR 19860 at commit
[`ce74fb8`](https://github.com/apache/spark/commit/ce
14 matches
Mail list logo