[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-14 Thread nongli
Github user nongli closed the pull request at: https://github.com/apache/spark/pull/12017 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-12 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-20662 @nongli I think you can close this in favor of https://github.com/apache/spark/pull/12279 --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-11 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12279#discussion_r59166426 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -200,8 +150,26 @@ void readBatch(int

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-10 Thread tedyu
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/12279#discussion_r59152083 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -200,8 +150,26 @@ void readBatch(int

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12279 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12279#issuecomment-207890147 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12279#issuecomment-207890145 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12279#issuecomment-207890068 **[Test build #55452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55452/consoleFull)** for PR 12279 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12279#issuecomment-207867238 **[Test build #55452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55452/consoleFull)** for PR 12279 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-09 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/12279 [SPARK-14217][SQL] Fix bug if parquet data has columns that use dictionary encoding for some of the data ## What changes were proposed in this pull request? This PR is based on #12017

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-08 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-207540219 @davies Can you help us update this patch? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206469530 @nongli Do you need me to take over this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12017#discussion_r58721198 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -200,9 +150,28 @@ void readBatch(int

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206216112 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206216118 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206215721 **[Test build #55097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55097/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206163567 **[Test build #55097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55097/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206160605 Is this PR good to go? Or we still need to update it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-04-06 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-206160674 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-203072990 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-203072987 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-203072223 **[Test build #54451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54451/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12017#discussion_r5083 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -246,11 +215,48 @@ private void

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-203036983 **[Test build #54451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54451/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread nongli
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202991498 @davies Yea. The format allows at most 1 dictionary per column per row group. Each page can have a different encoding though. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202752870 @nongli Is this because that within the same group, some pages are dictionary encoded, some not? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12017#discussion_r57681187 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -246,11 +215,48 @@ private void

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202651943 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202651944 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202651742 **[Test build #54378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54378/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202651513 cc @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12017#issuecomment-202624157 **[Test build #54378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54378/consoleFull)** for PR 12017 at commit

[GitHub] spark pull request: [SPARK-14217][SQL] Fix bug if parquet data has...

2016-03-28 Thread nongli
GitHub user nongli opened a pull request: https://github.com/apache/spark/pull/12017 [SPARK-14217][SQL] Fix bug if parquet data has columns that use dictionary encoding for some of the data ## What changes were proposed in this pull request? Currently, this causes batches