[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have some bad news. The methods `testSchemaPruning` and `testMixedCasePruning` do not set the configuration settings as expected. Fixing that reveals 6 failing tests for the mixed case tests. One

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 FYI, the PR I previously mentioned about fixing the use of `withSQLConf` is #22394. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216686762 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -155,6 +163,60 @@ class

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 I'm working on fixing these test failures now. Hopefully I'll have something pushed soon. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, @mallman I'm working on having ParquetFilter to support IsNotNull(employer.id) to be pushed into parquet reader. That would be pretty c

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216683076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216714387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 This LGTM. I'm not going to submit a PR for my approach to this problem. Thanks @viirya! --- - To unsubscribe, e-mail

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-15 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r181575614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -151,6 +151,9 @@ abstract class Optimizer

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223415835 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223424625 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-29 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22880 [SPARK-25407][SQL] Ensure we pass a compatible pruned schema to ParquetRowConverter ## What changes were proposed in this pull request? (Link to Jira issue: https://issues.apache.org/jira

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-10-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > https://issues.apache.org/jira/browse/SPARK-25879 > > If we select a nested field and a top level field, the schema pruning will fail. Here is the reproducible test, > ...

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229654276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22905 [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema (link to Jira: https://issues.apache.org/jira/browse/SPARK-25894) ## What changes were

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r229729687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229743035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229739407 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229738879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 @gatorsmile @viirya @cloud-fan @dbtsai your thoughts? cc @dongjoon-hyun for ORC file format perspective

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > is there anything blocked by this? I agree this is a good feature, but it asks the data source to provide a new ability, which may become a problem when migrating file sources to data source

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451108 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -182,18 +182,20 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229450720 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -93,13 +141,14 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229449812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -49,34 +49,82 @@ import

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > @mallman Could you run the EXPLAIN with this new changes and post it in the PR description? D

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230433746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230432336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230442852 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230125133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230128199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I'm sorry I haven't been more responsive here. I can relate to your frustration, and I do want to help you make progress on this PR and merge it in. I have indeed been busy

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231243760 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231249401 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Jenkins retest please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Can someone with Jenkins retest privileges please kick off a retest? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230914377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 cc @HyukjinKwon Would you like to review this PR? It's a bug fix. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r230916138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 @gatorsmile How do you feel about merging this in? Anyone else I should ping for review? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r233175025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233179076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,8 +204,12 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233180347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -130,8 +130,8 @@ private[parquet

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r222348674 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r222345462 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223422030 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223425011 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223441744 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-09-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Hi @Gauravshah. That branch has diverged substantially from what’s in master. Right now I’m preparing a PR to address a problem with the current implementation in master, but I’m on holiday

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 I think I've made my case for this patch as best I can. It does not appear this PR has unanimous support, but I continue to believe we should merge it to master. So where do we take it from here

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Hi @dbtsai @HyukjinKwon @gatorsmile @viirya. Can we merge this to master? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217055207 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217052036 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 > Hey @mallman, let's just target to fix the problem in the JIRA without other refactorings. The refactorings I've made address the problem directly. Hopefully that will be clearer with

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-11 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22394 [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to `withSQLConf` inside calls to `test` (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406) ## What

<    2   3   4   5   6   7