[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160077924 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160077928 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160077922 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160077944 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078014 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160077992 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078039 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078163 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078135 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078213 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078262 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078288 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078349 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078425 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078799 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160078819 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160079119 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160079405 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160082708 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160082910 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160086248 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 @henrify and @cloud-fan . I updated the PR with put APIs. You can check the BM result. --- - To unsubscribe, e-mail

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160087436 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160089167 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160089221 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160089474 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160089616 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160090565 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160090596 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160091017 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160093941 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160094622 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160096539 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,503

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Yes. It really does, @henrify . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160117940 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,510

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160117873 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,482

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Could you be more specific, @henrify ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160124013 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,510

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160124118 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/JavaOrcColumnarBatchReader.java --- @@ -0,0 +1,510

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160124592 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160124664 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160133388 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,517 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160174950 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160178751 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160178850 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160190108 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160232913 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160236021 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160236163 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160236240 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 @henrify , @cloud-fan . For @henrify 's question, I got the answer. The answer is negative like the official document. Even ORC reader side, the data for a VectorizedRowBatch comes from

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160251456 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Thanks. I checked the as-is inline behavior. As you told, ORC nextBatch is not inlined so far while Parquet nextBatch does. I'll try to optimize

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 After minimizing `nextBatch`, it becomes smaller than Parquet's `nextBatch`. But, it's inlined only some cases, but mostly not. It's not helpful. For the other technique,

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Oops. Without noticing your comments, I pushed another refactoring which split the functions. --- - To unsubscribe, e

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160311311 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,528

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160311441 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,528

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160311799 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,528

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Thank you for your help, @henrify . I think it's within margin of deviation. Split methods will be better for maintenance. So, I push

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160314342 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,509

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160317512 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,605

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 It's strange. The two tests have been passed in local labtop while jenkins always seems to fail. I'll inve

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160323569 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,605

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160325249 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,435 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19943 Thank you so much, @cloud-fan , @mmccline , @viirya , @henrify , @kiszk , @HyukjinKwon ! I'll proceed to follo

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160432982 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -0,0 +1,435 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20205: [SPARK-16060][SQL][follow-up] add a wrapper solut...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20205#discussion_r160461130 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -196,17 +234,26 @@ public void

[GitHub] spark issue #20205: [SPARK-16060][SQL][follow-up] add a wrapper solution for...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20205 BTW, if you don't mind, could you update the followings? It's @viirya 's comment, so I made a followup, but we had better have this in your PR. To make another follow

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-09 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20208 [SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based data sources ## What changes were proposed in this pull request? A schema can evolve in several ways and the

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Hi, @gatorsmile , @cloud-fan , @HyukjinKwon , @viirya . Could you review this PR? --- - To unsubscribe, e-mail

[GitHub] spark issue #18991: [SPARK-21783][SQL][WIP] Turn on ORC filter push-down by ...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 I reopen it to re-test the master branch with this option before Apache Spark 2.3. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #18991: [SPARK-21783][SQL][WIP] Turn on ORC filter push-d...

2018-01-09 Thread dongjoon-hyun
GitHub user dongjoon-hyun reopened a pull request: https://github.com/apache/spark/pull/18991 [SPARK-21783][SQL][WIP] Turn on ORC filter push-down by default ## What changes were proposed in this pull request? ORC filter push-down is disabled by default from the beginning

[GitHub] spark issue #20097: [SPARK-22912] v2 data source support in MicroBatchExecut...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20097 Hi, @tdas . Could you merge this to `branch-2.3` , too? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20097: [SPARK-22912] v2 data source support in MicroBatchExecut...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20097 Thank you, @tdas ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Also, ping @sameeragarwal , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20205: [SPARK-16060][SQL][follow-up] add a wrapper solut...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20205#discussion_r160581341 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -196,17 +234,26 @@ public void

[GitHub] spark issue #18991: [SPARK-21783][SQL][WIP] Turn on ORC filter push-down by ...

2018-01-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Ur, originally, it's not accepted by @gatorsmile due to lack of test cases. So, Today, I reopen it for testing purpose. Do you think we can enable it? I think w

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Yes, @cloud-fan . I added the same test coverage for ORC in Apache Spark. Sorry, @gatorsmile . I always turned on PPD, so there is no perf number for PPD=false

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Also, ping @rxin , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 @gatorsmile . I don't have any numbers for PPD=false. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apach

[GitHub] spark issue #20218: [SPARK-23000] [TEST-HADOOP2.6] Fix Flaky test suite Data...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20218 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20218: [SPARK-23000] [TEST-HADOOP2.6] Fix Flaky test suite Data...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20218 Since this is a flakiness issue, I retriggered it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20210: [SPARK-23009][PYTHON] Fix for non-str col names to creat...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20210 Hi, @HyukjinKwon . It seems that branch-2.3 doesn't have this. Could you merge to branch-2.3, too? --- ---

[GitHub] spark issue #20226: [SPARK-23034][Hive][UI] Display tablename for `HiveTable...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20226 It looks useful, @tejasapatil . Given that this is one line addition, why don't you handle the others? > For this JIRA, I am scoping those for hive table sc

[GitHub] spark issue #20226: [SPARK-23034][Hive][UI] Display tablename for `HiveTable...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20226 Then, specifically, what happens in this PR for Parquet/ORC table which is converted to data source tables with `convertMetastoreParquet/Orc`? Now, both parameters are `true` by default

[GitHub] spark issue #20210: [SPARK-23009][PYTHON] Fix for non-str col names to creat...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20210 Thank you! @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20226 Thank you, @tejasapatil . I see. Although this is not applicable for ORC/Parquet hive tables, the PR looks useful for me. Could you put the condition and limitation in PR description

[GitHub] spark pull request #20228: [SPARK-23036] Add withGlobalTempView for testing ...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20228#discussion_r160866670 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala --- @@ -140,8 +140,8 @@ class GlobalTempViewSuite extends

[GitHub] spark issue #20228: [SPARK-23036] Add withGlobalTempView for testing and cor...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20228 BTW, please put '[SQL]' into your title. Then, your PR will be listed under SQL category here. - https://spark-prs.appspot.co

[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r160867069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala --- @@ -33,6 +33,9 @@ class

[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r160867160 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -814,7 +814,7 @@ abstract class DDLSuite extends

[GitHub] spark pull request #20194: [SPARK-22999][SQL]'show databases like command' c...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20194#discussion_r160867737 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -141,7 +141,7 @@ statement (LIKE? pattern

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20195 @xubo245. Since it's merged, could you close your PR now? For the PR against old branches like this, we need to close the PR man

[GitHub] spark pull request #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK...

2018-01-10 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20230 [SPARK-23038][TEST] Update docker/spark-test (JDK/OS) ## What changes were proposed in this pull request? This PR aims to update the followings in `docker/spark-test

[GitHub] spark pull request #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20230#discussion_r160880463 --- Diff: external/docker/spark-test/base/Dockerfile --- @@ -15,14 +15,14 @@ # limitations under the License. # -FROM

[GitHub] spark pull request #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20230#discussion_r161016953 --- Diff: external/docker/spark-test/base/Dockerfile --- @@ -15,14 +15,14 @@ # limitations under the License. # -FROM

[GitHub] spark issue #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK/OS)

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20230 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Ur, it's not record-level filtering. Maybe, it's because I explained it too abstractly [here](https://github.com/apache/spark/pull/19943#discussion_r160251456 ). It's s

[GitHub] spark pull request #18991: [SPARK-21783][SQL] Turn on ORC filter push-down b...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/18991 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20227: [SPARK-23035][SQL] Fix warning: TEMPORARY TABLE ....

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20227#discussion_r161022213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala --- @@ -33,6 +33,9 @@ class

<    7   8   9   10   11   12   13   14   15   16   >