[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @gatorsmile I can't find your commit: ``` [msa@ip-10-0-8-34 spark-master]$ git fetch origin remote: Counting objects: 114, done. remote: Compressing objects: 100% (53/53), done

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @cloud-fan I'm not familiar enough with that code to be comfortable making that change. Can you submit a PR against `VideoAmp:spark-18572-list_partition_names` with the necessary changes

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @cloud-fan That's unfortunate if it's going to block this PR. How do we proceed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 > @mallman do you know which tests fail the partition spec checking? It looks to me that before we call partition related API in SessionCatalog, the partition column names should be normali

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 I've modified the behavior of the partition spec checking methods in `SessionCatalog` to test for case-sensitive analysis. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90788135 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -519,6 +519,26 @@ private[hive] class HiveClientImpl

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 @wangyum Thanks for fixing this. The fact that our tests did not catch this bug means we have a gap in our test coverage. It looks like the test in `HiveClientSuite` is incorrect. Can you fix

[GitHub] spark pull request #16122: [SPARK-18681][SQL] Fix filtering to compatible wi...

2016-12-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16122#discussion_r90781272 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -590,8 +590,10 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #16122: [SPARK-18681][SQL] Fix filtering to compatible wi...

2016-12-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16122#discussion_r90721187 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -600,11 +600,14 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 @wangyum Can you tell me what your underlying metastore database provider is? Postgres? MySQL? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @ericl @gatorsmile Please see test failure here: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69581/testReport/junit/org.apache.spark.sql.hive

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @gatorsmile I enhanced the coverage of `SessionCatalog.listPartitions` and `SessionCatalog.listPartitionNames` to include tests for invalid partial partition specs. --- If your project is set up

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 > can you also address this comment? #15998 (comment) Addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90681188 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -519,6 +519,26 @@ private[hive] class HiveClientImpl

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 > Like the other partition ExternalCatalog APIs, could you also add the negative test cases to ExternalCatalogSuite.scala? I'm sorry, I don't understand what you're asking for. Can

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 The test that failed is definitely related to this PR, however it passes for me locally. I'll investigate... --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 LMK if there's anything else you'd like me to address, otherwise—assuming the tests pass—please merge to master. Also, it would be great if we can back port this into 2.1 as well. --- If your

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90503688 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala --- @@ -155,6 +155,25 @@ private[hive] trait HiveClient

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 Added a couple of unit tests and rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16071: [SPARK-18635] [SQL] [WIP] Partition name/values not esca...

2016-11-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16071 I can't vouch for how `Path` and `URI` work together to do the right thing, however the test coverage looks good. LGTM overall. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #16071: [SPARK-18635] [SQL] [WIP] Partition name/values n...

2016-11-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16071#discussion_r90303311 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala --- @@ -205,6 +205,58 @@ class

[GitHub] spark pull request #16071: [SPARK-18635] [SQL] [WIP] Partition name/values n...

2016-11-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16071#discussion_r90300900 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala --- @@ -205,6 +205,58 @@ class

[GitHub] spark issue #16080: [SPARK-18647][SQL] do not put provider in table properti...

2016-11-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16080 I built and tested this branch, and it resolves the issue I was having with reading Spark 2.1 tables in earlier versions of Spark. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...

2016-11-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16080#discussion_r90286503 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -232,17 +233,26 @@ private[spark] class HiveExternalCatalog

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 I will work on additional unit test coverage tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-29 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90164499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -482,6 +483,19 @@ class InMemoryCatalog

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-29 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90164431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -730,6 +730,23 @@ class SessionCatalog

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-29 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r90164420 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -189,11 +189,28 @@ abstract class ExternalCatalog

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 Hi Guys, Repeating my comment/query for @ericl. I'm hoping someone can provide affirmation/refutation to my question before I proceed with new unit tests. I've run some tests

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-28 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89909840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -730,6 +730,23 @@ class SessionCatalog

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-28 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 > where is the speed-up come from? Is it because the hive API getPartitionNames is faster than getPartitions? Or is it because we generate the partition string(a=1/b=2/c=3) at hive side and i

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-28 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89839435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -730,6 +730,23 @@ class SessionCatalog

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89680470 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89679682 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89679527 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89679485 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89679477 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -189,6 +189,21 @@ abstract class ExternalCatalog

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89678628 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -482,6 +482,19 @@ class InMemoryCatalog

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89678259 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15998#discussion_r89678233 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -482,6 +482,19 @@ class InMemoryCatalog

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 CC @ericl @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-23 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/15998 [SPARK-18572][SQL] Add a method `listPartitionName` to `ExternalCatalog` (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-18572) ## What changes were proposed in this pull

[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...

2016-11-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15978 This patch resolves the performance problem I was seeing. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...

2016-11-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15978 I'll give this patch a try. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 @ericl Does this LGTY? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Pushed a new version which I think is cleaner than before. I tested all 8 scenarios manually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I'll work on a revision and try to push something today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Sorry, I don't see how Java code will make this patch any better. We're not really missing a static initializer. We just need _some_ initializer to run early enough. If we can tolerate

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 > Yeah I see, but this is getting to be quite hacky **just to turn off log messages** This isn't just a few annoying log messages. This is an _avalanche_ of log messages, each of wh

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 @rxin I'm not working on the Parquet upgrade this week. I think we'll have to punt on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I've pushed a rebase. I re-tested this PR using the methodology I describe in the description for both local and remote executors. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 > If this is necessary, then isn't it simpler to leave the log configuration call where it was, so that it doesn't depend on the constructor? that wasn't the actual problem was it, just the logg

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-11-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r86837361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -55,6 +56,21 @@ class ParquetFileFormat

[GitHub] spark issue #15797: [SPARK-17990][SPARK-18302][SQL] correct several partitio...

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15797 I tried your patch and it works for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Hi Guys, Unfortunately debugging our Spark job sucked up all my Spark time last week, and I still have more to do on it this week. Because of that, it doesn't look like I'll have time

[GitHub] spark issue #15797: [SPARK-17990][SPARK-18302][SQL] correct several partitio...

2016-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15797 @cloud-fan Thanks for taking this on. I will test out a build from your branch to verify that it fixes the problem I was seeing. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 @rxin One of our weekly Spark jobs is choking, and fixing it could take the rest of the week. I suggest someone else take the lead on the Parquet 1.9 upgrade if they can devote their time

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Yes, I am working on it. I'm planning to have a PR in no later than EOW. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 Happy to help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-11-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-11-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 @ericl I can do that, yes. I'm current tied down. I will push a new commit later today or tonight. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I found two such tickets. How should we organize this in Jira? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 @rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need to be resolved before I can rebase this PR. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2016-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r85865327 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2016-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r85864458 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 It looks like all the unit tests passed, however one of the forked test java processes exited with nonzero status for some unknown reason. --- If your project is set up for it, you can reply

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 In fact, if no one else is working on the Parquet upgrade it probably makes more sense for me to contribute that then continue working on this PR. I'll check with the dev mailing list

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 @ericl I've pushed a commit with the changes you recommended. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Are we planning to incorporate the Parquet 1.9 libraries into Spark 2.1? If so, then this PR should be unnecessary. Hopefully. --- If your project is set up for it, you can reply

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Getting this redirection to work on remote executors was quite involved. The additional complexities derive from the following: 1. Java doesn't call the default (or any) constructor

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Actually, this may not be working for remote executors. I tested this patch running in local mode, but in running a version of this with actual remote executors I'm seeing the original parquet log

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 The current merge conflict is from d2d438d1d549628a0183e468ed11d6e85b5d6061, which touches the same code. I'll wait for that to be settled before rebasing. --- If your project is set up

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 > Could we enable this fallback only when the conf is set to false? Otherwise, it might mask legitimate bugs. Certainly, but my intent with this PR is to prevent a (painful and confus

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 @srowen I ran my manual tests for this build and they worked as expected. Can you merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2016-10-28 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/15673 [SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter (Link to Jira issue: https://issues.apache.org/jira

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Looks like the test failed for reasons unrelated to this PR. Can someone trigger a retest, please? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-27 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r85368840 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -718,7 +722,9 @@ object

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r85142206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -718,7 +722,9 @@ object

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I'm still seeing the torrent of `CorruptStatistics` errors in the Jenkins build log, even though I don't see them running the tests locally with sbt. Maybe it's a maven versus sbt build issue

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 So raising the log threshold looks like it didn't do anything for Jenkins, but when I run the tests locally it does just the trick. \*sigh\* Anyway, might as well push a rebase and see what

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I like this test failure: ``` org.apache.spark.sql.sources.CreateTableAsSelectSuite.(It is not a test) ``` Anyway, I don't think this is related to this PR. --- If your

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 The "CorruptStatistics" stack traces (which I agree are really annoying) are being logged because parquet logs them at the WARN level, and Spark's default logging threshold when run

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I'll take a closer look at that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 @ericl What do you mean it's polluting test output? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15539 @ericl Great work on this. I don't know how I got an author credit in the commit... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I've spent a couple hours today working on a unit test which captures stdout during a parquet write operation to validate that it has no parquet logging output. I haven't got it working yet

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r84508947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -42,24 +43,21 @@ class TableFileCatalog

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r84402516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -42,24 +43,21 @@ class TableFileCatalog

[GitHub] spark issue #15566: [SPARK-18026][SQL] should not always lowercase partition...

2016-10-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15566 I went through the script I wrote for https://issues.apache.org/jira/browse/SPARK-17990. At the point where I ran ``` (0 to 9).foreach { p => sql(s"alter table mixed_case_part

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r84355969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I pushed a commit to improve the documentation. I also removed a couple of unused imports (boy scout rule). --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15568: [SPARK-18028][SQL] simplify TableFileCatalog

2016-10-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15568 @cloud-fan, I don't know what you mean in item 4 about `SessionCatalog.listPartitionsByFilter` handling case-sensitivity. What case-sensitivity issue are you referring to, and does this PR handle

[GitHub] spark issue #15569: [SPARK-18029][SQL] PruneFileSourcePartitions should not ...

2016-10-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15569 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15569: [SPARK-18029][SQL] PruneFileSourcePartitions shou...

2016-10-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15569#discussion_r84338922 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -102,8 +102,8 @@ case class CatalogTablePartition

[GitHub] spark pull request #15569: [SPARK-18029][SQL] PruneFileSourcePartitions shou...

2016-10-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15569#discussion_r84324810 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -102,8 +102,8 @@ case class CatalogTablePartition

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r84183026 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -718,7 +717,8 @@ object

[GitHub] spark pull request #15539: [SPARK-17994] [SQL] Add back a file status cache ...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15539#discussion_r84161990 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r84157257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -718,7 +717,8 @@ object

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-19 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 I could use some advice on writing a unit test for this. Do you guys know if there is a precedent in the codebase that covers a situation like this? I'd like to reuse existing code if possible

[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...

2016-10-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r84129203 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -718,7 +717,8 @@ object

<    1   2   3   4   5   6   7   >