Github user mallman commented on the issue:
https://github.com/apache/spark/pull/17633
> @mallman yea that sounds good to me
Okay. The testing diff is significantly simpler now.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Hi @dbtsai @HyukjinKwon @gatorsmile @viirya. Can we merge this to master?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
I think I've made my case for this patch as best I can. It does not appear
this PR has unanimous support, but I continue to believe we should merge it to
master. So where do we take it from
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Thanks everyone for your contributions, support and patience. It's been a
journey and a half, and I'm excited for the future. I will open a follow-on PR
to address the current kno
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/15673#discussion_r216037341
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -586,17 +587,31 @@ private[client] class Shim_v0_13 extends
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
Hi @viirya,
Thanks for this PR! I have an alternative implementation which I'd like to
submit for comparison. My implementation was something I removed from my
original patch.
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
I have reconstructed my original patch for this issue, but I've discovered
it will require more work to complete. However, as part of that reconstruction
I've discovered a couple of cases
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
> @mallman It will be great that we can have this fix in 2.4 release as
this can dramatically reduce the data being read in many applications which is
the purpose of the original work.
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
> FYI, per further checking code and discussion with @dbtsai regarding with
predicate pushdown, we know that predicate push down only works for primitive
types on Parquet datasource. So b
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216545091
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -110,7 +110,17 @@ private[sql
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216683076
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -110,7 +110,17 @@ private[sql
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216686762
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -155,6 +163,60 @@ class
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
@viirya Please amend
https://github.com/apache/spark/blob/d684a0f30599d50061ef78ec62edcdd3b726e2d9/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
I have some bad news. The methods `testSchemaPruning` and
`testMixedCasePruning` do not set the configuration settings as expected.
Fixing that reveals 6 failing tests for the mixed case tests. One
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22394
[SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to
`withSQLConf` inside calls to `test`
(Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406)
## What
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
FYI, the PR I previously mentioned about fixing the use of `withSQLConf` is
#22394.
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
I'm working on fixing these test failures now. Hopefully I'll have
something pushed soon.
---
-
To unsubscribe, e-mai
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22357#discussion_r216714387
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -110,7 +110,17 @@ private[sql
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
This LGTM. I'm not going to submit a PR for my approach to this problem.
Thanks @viirya!
---
-
To unsubscribe, e
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
FYI @viirya @dbtsai @gatorsmile @HyukjinKwon
Can I get someone's review of this PR please? The unmasked failures appear
to be false positives, so no changes to the tested cod
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
And FYI this is the Jira issue I promised in
https://github.com/apache/spark/pull/22357#issuecomment-419940228
yesterday: https://issues.apache.org/jira/browse/SPARK-25407
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22357
> FYI, @mallman I'm working on having ParquetFilter to support
IsNotNull(employer.id) to be pushed into parquet reader.
That would be pre
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22394#discussion_r217052036
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -245,28 +249,32 @@ class
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22394#discussion_r217055207
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -245,28 +249,32 @@ class
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
> Hey @mallman, let's just target to fix the problem in the JIRA without
other refactorings.
The refactorings I've made address the problem directly. Hopefully that
will be cl
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22394
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem.
I'm sorry I haven't been more responsive here. I can relate to your
frustration, and I do want to help you make progress on this PR and merge it
in. I have indeed been
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22880
[SPARK-25407][SQL] Ensure we pass a compatible pruned schema to
ParquetRowConverter
## What changes were proposed in this pull request?
(Link to Jira issue: https://issues.apache.org/jira
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> https://issues.apache.org/jira/browse/SPARK-25879
>
> If we select a nested field and a top level field, the schema pruning
will fail. Here is the reproducible test,
> ...
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229449812
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -49,34 +49,82 @@ import
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229450720
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -93,13 +141,14 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229451108
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -182,18 +182,20 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229451788
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229654276
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/22905
[SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column
count for a given schema
(link to Jira: https://issues.apache.org/jira/browse/SPARK-25894)
## What changes were
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
@gatorsmile @viirya @cloud-fan @dbtsai your thoughts?
cc @dongjoon-hyun for ORC file format perspective.
---
-
To
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r229729687
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
> is there anything blocked by this? I agree this is a good feature, but it
asks the data source to provide a new ability, which may become a problem when
migrating file sources to data source
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229738879
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229739407
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229743035
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230125133
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230128199
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230432336
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230433746
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230442852
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22905
> @mallman Could you run the EXPLAIN with this new changes and post it in
the PR description?
Done.
---
-
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r230914377
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r230916138
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
cc @HyukjinKwon
Would you like to review this PR? It's a bug fix.
---
-
To unsubscribe, e-mail: reviews-uns
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r231243760
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r231249401
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Jenkins retest please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
Can someone with Jenkins retest privileges please kick off a retest?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/22880
@gatorsmile How do you feel about merging this in? Anyone else I should
ping for review?
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22905#discussion_r233175025
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r233179076
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,8 +204,12 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r233180347
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -130,8 +130,8 @@ private[parquet
601 - 658 of 658 matches
Mail list logo