[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-19 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile The last build was killed by SIGKILL. Can you start a new build, please? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile I've addressed many of your points in today's commits. Can you please take a look at what I've done so far? I'm still working on the PRs you requested

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-06-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. Thanks for the kind reminder and thanks for your contribution. I'm sorry I did not respond sooner. I no longer work where I regularly used the checkpointing code with large

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile I've removed the changes to the files as you requested. This removes support for schema pruning on filters of queries. I've pushed the previous revision to a new branch in our `spark

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199648692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -182,18 +182,20 @@ private[parquet

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I dug deeper and think I understand the problem better. To state the obvious, the periodic checkpointer deletes checkpoint files of RDDs that are potentially still accessible

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199631341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -47,16 +47,25 @@ import

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199643803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @DaimonPl I'm going to resolve the merge conflicts shortly. Otherwise, I have no intention of making further modifications to this PR outside of further review

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya I've rebased to resolve conflicts. All tests are passing. Can you take another look and sign off? Cheers

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'd just suggest trying it. Since this PR is a patch for master, please message me personally at m...@allman.ms to discuss progress and questions on a backport to 2.2. If we get it working

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > However, I am -1 on merging a change this large after branch cut. It's disappointing, but I agree we can't merge a change this large into a branch cut. It will have to wait for 2.

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @mallman Yes, the issue with window functions is reproducible even with this PR. Can you attach a (small) parquet file I can use to test this scenario

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-28 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results from the following simple projection query, given @jainaks test file: > >``` >select page.url, page fr

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > These test failures are in Spark streaming. Is this just an intermittent test failure or actually caused by this PR? I was able to run the first failing test successfully. Can we

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > I was able to run the first failing test successfully. Can we get a retest, please? @ajacques I just rebased and pushed my branch off of master. Perhaps the easiest thing to do wo

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Anybody else able to reproduce this failure? It succeeded on my developer machine. It worked for me, too. Let's see what a retest d

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Oh dear. I don't know why we're getting all these sigkills. I think we're going to need another retest... --- - To unsubscribe

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Hi @gatorsmile. Where do you see us at this point? Do you still want to get this into Spark 2.4? --- - To unsubscribe, e-mail

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman, if we are all happy here, mind taking a look https://github.com/apache/spark/pull/21320#issuecomment-408271470 and https://github.com/apache/spark/pull/21320#issuecomment-406765

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Sorry it's taken me a couple of days to respond. I needed the time to ruminate (and not). I could write voluminously, but I just want to reply to a couple of points and move on. > I th

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Thanks @jainaks for the sample file and instructions to reproduce the problem. I will investigate and reply. @gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results f

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Thanks @jainaks for the sample file and instructions to reproduce the problem. I will investigate and reply

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Few comments like #21320 (comment) or ^ are not minor or nits. I leave hard -1 if they are not addressed. I'm sorry to say I'm very close to hanging up this PR. I put a lot of care, t

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Where does that leave both of these PRs? Do we still want this one with the code refactoring or to go back to the original? Are there any comments for this PR that would block merging? I've

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Success! Now where do we stand, @gatorsmile @HyukjinKwon ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Are there any other blockers to enabling this by default now that @mallman fixed the currently known broken queries? The functionality exercised by the ignored t

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? Oh! I didn't notice you chan

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: That test passes for me locally. Also, I inspected your branch and could not find any err

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r207718734 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r207719862 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following That's the test case that I "unignored". It was passing. There must be some simple e

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Test build #94228 has finished for PR 21889 at commit 92901da. The test failure appears to be unrelated to this PR. Is it just me or has the test suite become flakier in the p

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? @ajacques Please try this qu

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 I've pushed a commit to restore the original test coverage while also ensuring determinism of the output. Don't ask me how I did it. It's a secret! The test that was failing before

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > select id, name.middle, address from temp - Works > select name.middle, address from temp - Fails > select name.middle from temp - Works > select name.middle, id, addre

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman Is it related to this revert in ParquetReadSupport.scala? I re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed. Yes. That's what we need to w

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > This patch fails Scala style tests. Hi @ajacques. I'm not sure if you're aware of this, but you can run the scalastyle checks locally with ``` sbt scalast

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: > > * Successful build by Jenkins > * Any PR comments from

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 >> @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? > I have the hack I used originally, but I have

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? I have the hack I used originally, but I haven't tr

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques Please rebase off my branch. @gatorsmile I don't recall seeing that error before. Any idea for how I can reproduce and debug

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Assuming from #21889 (comment), we shouldn't have any identified bug here. What kind of bugs left to be fixed? That bug was address by b50ddb4. We still need to fix the bug underly

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r208446828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 See https://github.com/apache/spark/pull/21320#issuecomment-406353694 for @gatorsmile's request to move the changes to `ParquetReadSupport.scala` to another PR. There was another

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > just for clarification, so now .. there no outstanding bugs, some tests are ignored per #21320 (comment) and left comments were mostly addressed. Did i understand correctly? The igno

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques I added a commit to enable schema pruning by default. It's a little more complete than your commit to do the same. Please rebase off my branch and remove your commit

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Then should we keep this one or #21889? shall we deduplicate the efforts? I requested to open that because this looks going to be inactive per your comments. As I stated before, I

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman if you're planning on making more code changes, would you be willing to work on a shared branch or something? I've been working to incorporate the CR comments. No, howe

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Due to the urgency of the upcoming 2.4 code freeze, I'm going to open this PR to collect any feedback. This can be closed if you prefer to continue to the work in the original

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> Hello, we've been using your patch at Stripe and we've found something that looks like a new bug: > > Thank you for sharing this, @xinxin-stripe. This is very helpful. I will

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Hello, we've been using your patch at Stripe and we've found something that looks like a new bug: Thank you for sharing this, @xinxin-stripe. This is very helpful. I will investig

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> the window of opportunity to review syntax and style in this PR closed long ago. > Why/when is this window closed? Who closed that? What I wrote above is a coarse approximat

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > I see no point of leaving this PR open. I don't agree with you on that point, and I've expressed my view in https://github.com/apache/spark/pull/21889#issuecomment-413655

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Essentially, this PR was created to take the management of #21320 out of my hands, with a view towards facilitating its incorporation into Spark 2.4. It was my suggestion, one based in frustration

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman, can we close this PR? Are you willing to update here or not? I pushed an update less than a day ago, and I intend to continue pushing updates as nee

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Are we waiting for @gatorsmile's go-ahead and merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Try this when spark.sql.nestedSchemaPruning.enabled is on? This is a case-sensitivity issue (obviously). I'll get to the root of it. Tha

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Try this when spark.sql.nestedSchemaPruning.enabled is on? I don't think this will be difficult to fix. I'm working on it now and will add relevant test cover

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r212388958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r212396370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman Could you remove the changes made in ParquetRowConverter.scala and also turn off spark.sql.nestedSchemaPruning.enabled by default in this PR? D

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Thanks everyone for your contributions, support and patience. It's been a journey and a half, and I'm excited for the future. I will open a follow-on PR to address the current known failure

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile Any concerns about merging this PR at this point? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Add some test cases when turning on spark.sql.caseSensitive? Will do. --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile How does this look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > I think if the tests are few, you can make them ignored for now here, and make another PR enabling it back with the changes in ParquetReadSupport.scala. That's the approach I've ta

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > BTW, I am trying to take a look closely. I would appreciate if there are some concrete examples so that I (or other reviewers) can double check along. Parquet is pretty core fix and let's be v

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204208518 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,387 @@ +/* + * Licensed

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman I still think we need to split it to two PRs. To resolve the issues you mentioned above, how about creating a separate PR? Only 10 days left before the code freeze of Spark 2.4. We p

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 The test failure is unrelated to this patch. Shall we retest? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204209033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1298,8 +1298,18 @@ object SQLConf { "issues.

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Could we move the changes made in ParquetReadSupport.scala to a separate PR? Then, we can merge this PR very quickly. If I remove the changes to `ParquetReadSupport.scala`, then f

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman I still think we need to split it to two PRs. To resolve the issues you mentioned above, how about creating a separate PR? Only 10 days left before the code freeze of Spark 2.4. We p

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204206072 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205020970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021140 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021282 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021469 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021712 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,388 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022799 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022974 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,388 @@ +/* + * Licensed

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Regarding #21320 (comment), can you at least set this enable by default and see if some existing tests are broken or not? I have no intention to at this po

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > gentle ping @mallman since the code freeze is close Outside of my primary occupation, my top priority on this PR right now is investigating https://github.com/apache/spark/pull/21

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> Hi @jainaks. Thanks for your report. Do you have the same problem running your test with this PR? > @mallman Yes, the issue with window functions is reproducible even with

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2018-09-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r216037341 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -586,17 +587,31 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 Hi @viirya, Thanks for this PR! I have an alternative implementation which I'd like to submit for comparison. My implementation was something I removed from my original patch. I

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > @mallman It will be great that we can have this fix in 2.4 release as this can dramatically reduce the data being read in many applications which is the purpose of the original work.

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So b

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216545091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have reconstructed my original patch for this issue, but I've discovered it will require more work to complete. However, as part of that reconstruction I've discovered a couple of cases where our

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I understand you've put a lot of work into this implementation, however I think you should try a simpler approach before we consider something more complicated. I believe

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863353 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863463 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. Thanks for the information regarding disk use for your scenario. What do you think about my second point, using the `ContextCleaner

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 @viirya Please amend https://github.com/apache/spark/blob/d684a0f30599d50061ef78ec62edcdd3b726e2d9/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 FYI @viirya @dbtsai @gatorsmile @HyukjinKwon Can I get someone's review of this PR please? The unmasked failures appear to be false positives, so no changes to the tested code

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 And FYI this is the Jira issue I promised in https://github.com/apache/spark/pull/22357#issuecomment-419940228 yesterday: https://issues.apache.org/jira/browse/SPARK-25407

<    1   2   3   4   5   6   7   >