Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
I have some bad news. The methods `testSchemaPruning` and
`testMixedCasePruning` do not set the configuration settings as expected.
Fixing that reveals 6 failing tests for the mixed case tests. One
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
FYI, the PR I previously mentioned about fixing the use of `withSQLConf` is
#22394.
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216686762
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -155,6 +163,60 @@ class
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
I'm working on fixing these test failures now. Hopefully I'll have
something pushed soon.
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
> FYI, @mallman I'm working on having ParquetFilter to support
IsNotNull(employer.id) to be pushed into parquet reader.
That would be pretty c
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216683076
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -110,7 +110,17 @@ private[sql
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216714387
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -110,7 +110,17 @@ private[sql
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
This LGTM. I'm not going to submit a PR for my approach to this problem.
Thanks @viirya!
---
-
To unsubscribe, e-mail
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r181575614
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -151,6 +151,9 @@ abstract class Optimizer
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r223415835
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r223424625
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22880
[SPARK-25407][SQL] Ensure we pass a compatible pruned schema to
ParquetRowConverter
## What changes were proposed in this pull request?
(Link to Jira issue: https://issues.apache.org/jira
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> https://issues.apache.org/jira/browse/SPARK-25879
>
> If we select a nested field and a top level field, the schema pruning
will fail. Here is the reproducible test,
> ...
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229654276
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22905
[SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column
count for a given schema
(link to Jira: https://issues.apache.org/jira/browse/SPARK-25894)
## What changes were
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r229729687
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229743035
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229739407
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229738879
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
@gatorsmile @viirya @cloud-fan @dbtsai your thoughts?
cc @dongjoon-hyun for ORC file format perspective
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
> is there anything blocked by this? I agree this is a good feature, but it
asks the data source to provide a new ability, which may become a problem when
migrating file sources to data source
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229451108
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -182,18 +182,20 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229450720
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -93,13 +141,14 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229449812
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -49,34 +49,82 @@ import
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229451788
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
> @mallman Could you run the EXPLAIN with this new changes and post it in
the PR description?
D
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230433746
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230432336
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230442852
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230125133
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230128199
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem.
I'm sorry I haven't been more responsive here. I can relate to your
frustration, and I do want to help you make progress on this PR and merge it
in. I have indeed been busy
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r231243760
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r231249401
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Jenkins retest please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Can someone with Jenkins retest privileges please kick off a retest?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230914377
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
cc @HyukjinKwon
Would you like to review this PR? It's a bug fix.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r230916138
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
@gatorsmile How do you feel about merging this in? Anyone else I should
ping for review?
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r233175025
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r233179076
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,8 +204,12 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r233180347
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -130,8 +130,8 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r222348674
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r222345462
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r223422030
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r223425011
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r223441744
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Hi @Gauravshah. That branch has diverged substantially from whatâs in
master. Right now Iâm preparing a PR to address a problem with the current
implementation in master, but Iâm on holiday
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
I think I've made my case for this patch as best I can. It does not appear
this PR has unanimous support, but I continue to believe we should merge it to
master. So where do we take it from here
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Hi @dbtsai @HyukjinKwon @gatorsmile @viirya. Can we merge this to master?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22394#discussion_r217055207
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -245,28 +249,32 @@ class
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22394#discussion_r217052036
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -245,28 +249,32 @@ class
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
> Hey @mallman, let's just target to fix the problem in the JIRA without
other refactorings.
The refactorings I've made address the problem directly. Hopefully that
will be clearer with
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22394
[SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to
`withSQLConf` inside calls to `test`
(Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406)
## What
601 - 656 of 656 matches
Mail list logo