Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/12067
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is ena
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-210276781
@davies thanks for your review! merging to master!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59824988
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spa
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59788972
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-210134175
**[Test build #2788 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2788/consoleFull)**
for PR 12067 at commit
[`4ee5ac1`](https://
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59770733
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spa
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-210095306
LGTM, will merge this once it passed the tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-210095212
**[Test build #2788 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2788/consoleFull)**
for PR 12067 at commit
[`4ee5ac1`](https://g
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59769933
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59674017
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spa
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59673617
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spa
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59673488
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spa
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59662070
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59662097
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59662054
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59649177
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59648857
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59648732
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59647837
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext =
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59647218
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59617422
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -19,133 +19,153 @@ package org.apache.spark.
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r59615600
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -117,30 +160,45 @@ object DatasetBenchmark {
val sparkContext =
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208625402
@davies Can you review?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208203800
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208203803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208203581
**[Test build #55509 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55509/consoleFull)**
for PR 12067 at commit
[`4ee5ac1`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208178674
**[Test build #55509 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55509/consoleFull)**
for PR 12067 at commit
[`4ee5ac1`](https://gi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208175823
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208175822
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208175816
**[Test build #55508 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55508/consoleFull)**
for PR 12067 at commit
[`050e942`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208175668
**[Test build #55508 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55508/consoleFull)**
for PR 12067 at commit
[`050e942`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208171753
**[Test build #55506 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55506/consoleFull)**
for PR 12067 at commit
[`9e9be45`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208171754
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208171756
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208171517
**[Test build #55506 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55506/consoleFull)**
for PR 12067 at commit
[`9e9be45`](https://gi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208131549
**[Test build #55499 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55499/consoleFull)**
for PR 12067 at commit
[`045a9be`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208131587
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208131584
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-208129959
**[Test build #55499 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55499/consoleFull)**
for PR 12067 at commit
[`045a9be`](https://gi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207215899
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207215901
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207215671
**[Test build #55298 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55298/consoleFull)**
for PR 12067 at commit
[`7a136c5`](https://g
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207199587
Well it's not cheating if the user doesn't need to explicitly reuse.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHu
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207195627
And I think "reuse a single object" should help, as then we only need to
create one object for one partition. But it's like cheating, because RDD
doesn't reuse the ob
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207195178
In the benchmark, for RDD, we first apply a function to turn a long into a
`Data`, then do aggregate. For Dataset, we first turn a long to a `UTFString`,
then turn th
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207193129
The part I don't get is that even in the RDD case, we'd need to create an
object per row. This is equivalent to the "deserialization" in aggregator,
since they both just c
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207192990
if we can reuse a single object and mutate the object in place, would it be
the same speed?
---
If your project is set up for it, you can reply to this email and have yo
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207191252
@rxin , because aggregator needs to deserialize internal row to object
fist, then call aggregator methods.
---
If your project is set up for it, you can reply to th
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207184158
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207184159
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207183673
**[Test build #55289 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55289/consoleFull)**
for PR 12067 at commit
[`5f6510e`](https://g
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207179930
Do you know why the aggregator sum is slower than rdd sum? I'd imagine they
are comparable.
---
If your project is set up for it, you can reply to this email and have yo
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207168486
**[Test build #55298 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55298/consoleFull)**
for PR 12067 at commit
[`7a136c5`](https://gi
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207167788
The last commit increase the benchmark data size, and re-order the
benchmark to run RDD first(as baseline), and then DataFrame, and finally
Dataset.
---
If your pro
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207167800
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207167798
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207167246
**[Test build #55279 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55279/consoleFull)**
for PR 12067 at commit
[`ae1bdd1`](https://g
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r58969343
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -135,12 +175,26 @@ object DatasetBenchmark {
benchmark.run()
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207150963
**[Test build #55289 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55289/consoleFull)**
for PR 12067 at commit
[`5f6510e`](https://gi
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207147329
@davies Can you review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r58965247
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -82,5 +123,16 @@ object DatasetBenchmark {
RDD
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207143006
cc @davies
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207142452
**[Test build #55279 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55279/consoleFull)**
for PR 12067 at commit
[`ae1bdd1`](https://gi
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-207142308
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206894556
the benchmark result of master branch is extremely slow:
```
aggregate: Best/Avg Time(ms)Rate(M/s) Per
Row(ns) Relative
-
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206805778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206805774
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206805573
**[Test build #55209 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55209/consoleFull)**
for PR 12067 at commit
[`905234e`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206778169
**[Test build #55209 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55209/consoleFull)**
for PR 12067 at commit
[`905234e`](https://gi
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206763439
generated code snippet in mutable projection codegen for complex buffer
type UDAF
```
object ComplexResultAgg extends Aggregator[(String, Int), (Long, Long),
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-206755637
generated code snippet in whole stage codegen for
`val ds = Seq(("a", 10)).toDS().groupByKey(_._1).agg(typed.sum(_._2))`:
```
/* 095 */ // evaluate ag
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-205415272
@cloud-fan @marmbrus I think could do the similar trick in MapElements in
TungstenAggregagte, evaluate the functions first, then replace them with the
generated variable
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-205406330
It would be awesome to run Spark SQL perf and see what the speed up is here
after the elimination is fixed. You might even be able to do it directly from
the Spark re
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r58413497
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1469,7 +1453,45 @@ class Analyzer(
Project(pro
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r58413353
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
---
@@ -307,3 +307,17 @@ case class UnresolvedAlias(child: Ex
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/12067#discussion_r58327554
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala ---
@@ -43,52 +43,52 @@ import
org.apache.spark.sql.execution.aggregate.Typ
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-204665061
cc @davies again - can you take a look at wenchen's question?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-203955133
**[Test build #54621 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54621/consoleFull)**
for PR 12067 at commit
[`25ee508`](https://g
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-203955479
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12067#issuecomment-203955481
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
80 matches
Mail list logo