[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48591458 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -317,14 +318,39 @@ private[sql] class

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167823882 **[Test build #48415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)** for PR 10498 at commit

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843318 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843317 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167843065 **[Test build #48415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48415/consoleFull)** for PR 10498 at commit

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588277 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -129,6 +129,19 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -129,6 +129,19 @@ final class DataFrameWriter private[sql](df: DataFrame)

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -317,14 +318,39 @@ private[sql] class

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588458 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -129,6 +129,19 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -129,6 +129,19 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -317,14 +318,39 @@ private[sql] class

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588334 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -189,13 +205,44 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588391 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -129,6 +129,19 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -189,13 +205,44 @@ final class DataFrameWriter private[sql](df:

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588795 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread nongli
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10498#discussion_r48588812 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-29 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167783752 This one also includes #10435, we can merge that first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167664157 BTW in github you can use square brackets to create a checklist, e.g. ``` - [] item a - [] item b ``` becomes - [] item a - []

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167581087 cc @yhuai @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167582335 **[Test build #48367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48367/consoleFull)** for PR 10498 at commit

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/10498 [SPARK-12539][SQL][WIP] support writing bucketed table Done: * add bucket info in write path * support writing bucketed `HadoopFsRelation` TODO: * figure out a way

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167614005 This one also includes https://github.com/apache/spark/pull/10435, right? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167596712 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167596617 **[Test build #48367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48367/consoleFull)** for PR 10498 at commit

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167596714 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...

2015-12-28 Thread nongli
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167649759 @cloud-fan > currently we don't shuffle before writing partitioned data, which means we will have same partition data in different RDD blocks, and that's why we