Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
Yeah! Let me close this for now. We can discuss this further in the Jira
ticket then.
---
-
To unsubscribe, e-mail: reviews-unsub
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20806
great! maybe we can hold this PR for a real SQL tree aggregate in the
future, with some proper design and discussion.
---
-
To
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@cloud-fan @WeichenXu123 Ok. I've setup a Spark cluster with 5 nodes for
the benchmark.
The used data:
```
val r = new Random
val ds = (0 to 1).map { _ =>
val a = Array
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@WeichenXu123 As the discussion with @cloud-fan at
https://github.com/apache/spark/pull/20806#discussion_r174277864, I'd like to
see some performance gain it, but I need to run benchmark to see if th
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20806
@viirya ok. but there're already a class in ML use
`TypedImperativeAggregator`, see `Summarizer`.
And do you benchmark and compare this PR and `df.rdd.treeAggregate`?
Seems they're
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@WeichenXu123 The `seqOp`/`comOp` can be arbitrary and works on domain
objects, I'm not sure if built-in agg functions can satisfy all the use of it.
For example, it seems hard to express `IDF.Docume
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20806
But I haven't benchmark. Maybe it do not worth to do codegen for
treeAggregate.
---
-
To unsubscribe, e-mail: reviews-unsub
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20806
@viirya Yes. `treeAggregate` should only apply to global aggregate.
But in this PR the API have to use `seqOp`/`combOp`.
What I expect is that the dataframe version treeAggregate can expl
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@WeichenXu123 I feel `groupBy` is more SQL-like aggregation by which we can
specify a key to grouping by. At least `rdd.treeAggregate` does not support
key-specified aggregation.
For typed g
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20806
The API seems not dataframe style. What I expect is something like:
```
dataset.groupBy().setAggregateLevel(2).agg(Map("age" -> "max", "salary" ->
"avg"))
```
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88200/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20806
**[Test build #88200 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88200/testReport)**
for PR 20806 at commit
[`a254d15`](https://github.com/apache/spark/commit/a
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20806
**[Test build #88200 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88200/testReport)**
for PR 20806 at commit
[`a254d15`](https://github.com/apache/spark/commit/a2
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1482/
Tes
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88194/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20806
**[Test build #88194 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88194/testReport)**
for PR 20806 at commit
[`a254d15`](https://github.com/apache/spark/commit/a
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20806
**[Test build #88194 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88194/testReport)**
for PR 20806 at commit
[`a254d15`](https://github.com/apache/spark/commit/a2
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1478/
Tes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20806
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
cc @dbtsai @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
24 matches
Mail list logo