[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-67251843 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3375 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-11 Thread tianyi
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-66733251 @marmbrus would you mind review this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-66567851 [Test build #24350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24350/consoleFull) for PR 3375 at commit [`72a8aec`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-66567852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-66564123 [Test build #24350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24350/consoleFull) for PR 3375 at commit [`72a8aec`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21575519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @t

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-09 Thread tianyi
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21538764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21529898 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @t

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-09 Thread tianyi
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21519410 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-09 Thread tianyi
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21519366 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21477718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transie

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-07 Thread tianyi
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21435396 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-65744120 One minor comment, otherwise LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21354115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transie

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-12-04 Thread tianyi
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-65734070 @marmbrus , any suggestion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64319669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64319659 [Test build #23819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23819/consoleFull) for PR 3375 at commit [`99c5c97`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-24 Thread tianyi
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64316338 I made some optimization for the performance. Here is the result: the test data is generated by the following script generator.sh ``` for((i=1;i<=$

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64315173 [Test build #23819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23819/consoleFull) for PR 3375 at commit [`99c5c97`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64147857 [Test build #23768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23768/consoleFull) for PR 3375 at commit [`d2f94d7`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64147859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-64145013 [Test build #23768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23768/consoleFull) for PR 3375 at commit [`d2f94d7`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63949544 [Test build #23718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23718/consoleFull) for PR 3375 at commit [`1f2c6f1`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63949550 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63945383 [Test build #23718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23718/consoleFull) for PR 3375 at commit [`1f2c6f1`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63931410 [Test build #23709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23709/consoleFull) for PR 3375 at commit [`a676de6`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63931414 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63931128 [Test build #23709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23709/consoleFull) for PR 3375 at commit [`a676de6`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63904511 [Test build #23692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23692/consoleFull) for PR 3375 at commit [`9e7d5b5`](https://gith

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63904518 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63900854 [Test build #23692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23692/consoleFull) for PR 3375 at commit [`9e7d5b5`](https://githu

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-20 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63900462 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-19 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63755562 Generally it looks great to me, the only concern is the memory V.S computation, particularly it's heavily reply on the scala iterator, maybe we can optimize that i

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-19 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r20625403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -72,53 +72,54 @@ case class HashOuterJoin( //

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-19 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r20625318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -72,53 +72,54 @@ case class HashOuterJoin( //

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3375#issuecomment-63748932 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...

2014-11-19 Thread tianyi
GitHub user tianyi opened a pull request: https://github.com/apache/spark/pull/3375 [SPARK-4483][SQL]Optimization about reduce memory costs during the HashOuterJoin In `HashOuterJoin.scala`, spark read data from both side of join operation before zip them together. It is a waste fo