[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > @mallman yea that sounds good to me Okay. The testing diff is significantly simpler now. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Hi @dbtsai @HyukjinKwon @gatorsmile @viirya. Can we merge this to master? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 I think I've made my case for this patch as best I can. It does not appear this PR has unanimous support, but I continue to believe we should merge it to master. So where do we take it from

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Thanks everyone for your contributions, support and patience. It's been a journey and a half, and I'm excited for the future. I will open a follow-on PR to address the current kno

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2018-09-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r216037341 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -586,17 +587,31 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 Hi @viirya, Thanks for this PR! I have an alternative implementation which I'd like to submit for comparison. My implementation was something I removed from my original patch.

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have reconstructed my original patch for this issue, but I've discovered it will require more work to complete. However, as part of that reconstruction I've discovered a couple of cases

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > @mallman It will be great that we can have this fix in 2.4 release as this can dramatically reduce the data being read in many applications which is the purpose of the original work.

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So b

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216545091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216683076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216686762 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -155,6 +163,60 @@ class

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 @viirya Please amend https://github.com/apache/spark/blob/d684a0f30599d50061ef78ec62edcdd3b726e2d9/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have some bad news. The methods `testSchemaPruning` and `testMixedCasePruning` do not set the configuration settings as expected. Fixing that reveals 6 failing tests for the mixed case tests. One

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-11 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22394 [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to `withSQLConf` inside calls to `test` (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406) ## What

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 FYI, the PR I previously mentioned about fixing the use of `withSQLConf` is #22394. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 I'm working on fixing these test failures now. Hopefully I'll have something pushed soon. --- - To unsubscribe, e-mai

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216714387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 This LGTM. I'm not going to submit a PR for my approach to this problem. Thanks @viirya! --- - To unsubscribe, e

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 FYI @viirya @dbtsai @gatorsmile @HyukjinKwon Can I get someone's review of this PR please? The unmasked failures appear to be false positives, so no changes to the tested cod

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 And FYI this is the Jira issue I promised in https://github.com/apache/spark/pull/22357#issuecomment-419940228 yesterday: https://issues.apache.org/jira/browse/SPARK-25407

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, @mallman I'm working on having ParquetFilter to support IsNotNull(employer.id) to be pushed into parquet reader. That would be pre

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217052036 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217055207 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 > Hey @mallman, let's just target to fix the problem in the JIRA without other refactorings. The refactorings I've made address the problem directly. Hopefully that will be cl

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I'm sorry I haven't been more responsive here. I can relate to your frustration, and I do want to help you make progress on this PR and merge it in. I have indeed been

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-29 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22880 [SPARK-25407][SQL] Ensure we pass a compatible pruned schema to ParquetRowConverter ## What changes were proposed in this pull request? (Link to Jira issue: https://issues.apache.org/jira

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > https://issues.apache.org/jira/browse/SPARK-25879 > > If we select a nested field and a top level field, the schema pruning will fail. Here is the reproducible test, > ...

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229449812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -49,34 +49,82 @@ import

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229450720 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -93,13 +141,14 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451108 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -182,18 +182,20 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229654276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22905 [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema (link to Jira: https://issues.apache.org/jira/browse/SPARK-25894) ## What changes were

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 @gatorsmile @viirya @cloud-fan @dbtsai your thoughts? cc @dongjoon-hyun for ORC file format perspective. --- - To

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r229729687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > is there anything blocked by this? I agree this is a good feature, but it asks the data source to provide a new ability, which may become a problem when migrating file sources to data source

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229738879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229739407 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229743035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230125133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230128199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230432336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230433746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230442852 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > @mallman Could you run the EXPLAIN with this new changes and post it in the PR description? Done. --- -

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230914377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r230916138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 cc @HyukjinKwon Would you like to review this PR? It's a bug fix. --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231243760 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231249401 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Jenkins retest please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Can someone with Jenkins retest privileges please kick off a retest? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 @gatorsmile How do you feel about merging this in? Anyone else I should ping for review? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r233175025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233179076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,8 +204,12 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233180347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -130,8 +130,8 @@ private[parquet

<    2   3   4   5   6   7