[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12067 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-210276781 @davies thanks for your review! merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59824988 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spa

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59788972 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-210134175 **[Test build #2788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2788/consoleFull)** for PR 12067 at commit [`4ee5ac1`](https://

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59770733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spa

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-210095306 LGTM, will merge this once it passed the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-210095212 **[Test build #2788 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2788/consoleFull)** for PR 12067 at commit [`4ee5ac1`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59769933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59674017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spa

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59673617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spa

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59673488 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spa

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59662070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59662097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59662054 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59649177 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59648857 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59648732 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59647837 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext =

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59647218 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59617422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -19,133 +19,153 @@ package org.apache.spark.

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r59615600 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -117,30 +160,45 @@ object DatasetBenchmark { val sparkContext =

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208625402 @davies Can you review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208203800 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208203803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208203581 **[Test build #55509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55509/consoleFull)** for PR 12067 at commit [`4ee5ac1`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208178674 **[Test build #55509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55509/consoleFull)** for PR 12067 at commit [`4ee5ac1`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208175823 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208175822 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208175816 **[Test build #55508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55508/consoleFull)** for PR 12067 at commit [`050e942`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208175668 **[Test build #55508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55508/consoleFull)** for PR 12067 at commit [`050e942`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208171753 **[Test build #55506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55506/consoleFull)** for PR 12067 at commit [`9e9be45`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208171754 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208171756 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208171517 **[Test build #55506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55506/consoleFull)** for PR 12067 at commit [`9e9be45`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208131549 **[Test build #55499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55499/consoleFull)** for PR 12067 at commit [`045a9be`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208131587 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208131584 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-208129959 **[Test build #55499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55499/consoleFull)** for PR 12067 at commit [`045a9be`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207215899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207215901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207215671 **[Test build #55298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55298/consoleFull)** for PR 12067 at commit [`7a136c5`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207199587 Well it's not cheating if the user doesn't need to explicitly reuse. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207195627 And I think "reuse a single object" should help, as then we only need to create one object for one partition. But it's like cheating, because RDD doesn't reuse the ob

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207195178 In the benchmark, for RDD, we first apply a function to turn a long into a `Data`, then do aggregate. For Dataset, we first turn a long to a `UTFString`, then turn th

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207193129 The part I don't get is that even in the RDD case, we'd need to create an object per row. This is equivalent to the "deserialization" in aggregator, since they both just c

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207192990 if we can reuse a single object and mutate the object in place, would it be the same speed? --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207191252 @rxin , because aggregator needs to deserialize internal row to object fist, then call aggregator methods. --- If your project is set up for it, you can reply to th

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207184158 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207184159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207183673 **[Test build #55289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55289/consoleFull)** for PR 12067 at commit [`5f6510e`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207179930 Do you know why the aggregator sum is slower than rdd sum? I'd imagine they are comparable. --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207168486 **[Test build #55298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55298/consoleFull)** for PR 12067 at commit [`7a136c5`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207167788 The last commit increase the benchmark data size, and re-order the benchmark to run RDD first(as baseline), and then DataFrame, and finally Dataset. --- If your pro

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207167800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207167798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207167246 **[Test build #55279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55279/consoleFull)** for PR 12067 at commit [`ae1bdd1`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58969343 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -135,12 +175,26 @@ object DatasetBenchmark { benchmark.run()

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207150963 **[Test build #55289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55289/consoleFull)** for PR 12067 at commit [`5f6510e`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207147329 @davies Can you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58965247 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -82,5 +123,16 @@ object DatasetBenchmark { RDD

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207143006 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207142452 **[Test build #55279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55279/consoleFull)** for PR 12067 at commit [`ae1bdd1`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-207142308 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206894556 the benchmark result of master branch is extremely slow: ``` aggregate: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206805778 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206805774 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206805573 **[Test build #55209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55209/consoleFull)** for PR 12067 at commit [`905234e`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206778169 **[Test build #55209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55209/consoleFull)** for PR 12067 at commit [`905234e`](https://gi

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206763439 generated code snippet in mutable projection codegen for complex buffer type UDAF ``` object ComplexResultAgg extends Aggregator[(String, Int), (Long, Long),

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-07 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-206755637 generated code snippet in whole stage codegen for `val ds = Seq(("a", 10)).toDS().groupByKey(_._1).agg(typed.sum(_._2))`: ``` /* 095 */ // evaluate ag

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-205415272 @cloud-fan @marmbrus I think could do the similar trick in MapElements in TungstenAggregagte, evaluate the functions first, then replace them with the generated variable

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-205406330 It would be awesome to run Spark SQL perf and see what the speed up is here after the elimination is fixed. You might even be able to do it directly from the Spark re

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58413497 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1469,7 +1453,45 @@ class Analyzer( Project(pro

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58413353 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -307,3 +307,17 @@ case class UnresolvedAlias(child: Ex

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12067#discussion_r58327554 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala --- @@ -43,52 +43,52 @@ import org.apache.spark.sql.execution.aggregate.Typ

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-04-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-204665061 cc @davies again - can you take a look at wenchen's question? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-203955133 **[Test build #54621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54621/consoleFull)** for PR 12067 at commit [`25ee508`](https://g

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-203955479 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-14275][SQL] Reimplement TypedAggregateE...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12067#issuecomment-203955481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/