[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-09-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Please refer the superset in #17980 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional com

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 BTW, the latest version is maintained in #17980. Recently, Spark Vector format is changed. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Hi, I didn't try that, but that's not a concept of Spark data source table. Please don't expect that. :) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-09-03 Thread cenyuhai
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/17924 @dongjoon-hyun I have a question: does this orc data sources reader support a table contains multiple file format for example: table/ day=2017-09-01 RCFile day=2017-0

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80604/ Test PASSed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #80604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80604/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #80604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80604/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-08-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76920/ Test PASSed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Hi, All. For further discussion and easy comparision, I made another PR (#17943) except `ColumnarBatch`. --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Hi, @rxin , @marmbrus , @gatorsmile , @sameeragarwal . Could you give us your opinions on this approach in Spark SQL part, too? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Yep. Since this is an approach adding new dependency on Apache ORC, the non-vectorized PR also will need more supports(or approval) from the committers. I'll wait for more opinions at the curr

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17924 @dongjoon-hyun It is good for me. We can reduce the size of this PR too and mitigate review job. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 @cloud-fan and @viirya . Shall we remove the vectorized part from this PR? - The non-vectorized ORCFileFormat is mandatory and also the performance is better than the current one.

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17924 From the current benchmark, seems the performance has not obvious improvement, compared with the vectorized Hive ORC reader #13775. Maybe with more efficient batch approach as @cloud-fan sugg

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76705/ Test PASSed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76705/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76705/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76699/ Test FAILed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76699/testReport)** for PR 17924 at commit [`4607e0e`](https://github.com/apache/spark/commit/4

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76699/testReport)** for PR 17924 at commit [`4607e0e`](https://github.com/apache/spark/commit/46

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76695/ Test FAILed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76695/testReport)** for PR 17924 at commit [`4607e0e`](https://github.com/apache/spark/commit/4

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76695/testReport)** for PR 17924 at commit [`4607e0e`](https://github.com/apache/spark/commit/46

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76693/ Test FAILed. ---

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76693/testReport)** for PR 17924 at commit [`8bfd4bb`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76693/testReport)** for PR 17924 at commit [`8bfd4bb`](https://github.com/apache/spark/commit/8b