[GitHub] spark issue #16107: SPARK-18677: Fix parsing ['key'] in JSON path expression...

2016-12-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16107 Thanks for the quick review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 I don't think a fork is a good idea, nor do I think there is a reasonable need for one. @gatorsmile brought up that the Parquet community refused to build a patch release: "The

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 Great! I'm glad it was just confusion. I completely agree with @srowen that forking should be a last resort. In the future, please reach out to the community, whether its Parquet or an

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 I think we should move to a 1.8.2 patch release. The reason is that 1.9.0 moved to ByteBuffer based reads and we've found at least one problem with it. ByteBuffer based reads also chang

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 Thanks @dongjoon-hyun! Lets get a Parquet 1.8.2 out in January. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/16281 The improvement is in how row groups are garbage collected. G1GC puts humongous allocations directly into the old generation, so you end up needing a full GC to reclaim the space. That just

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-04-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r60151772 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,8 +62,14 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: [SPARK-14679] [UI] Fix UI DAG visualization OO...

2016-04-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12437#discussion_r60257825 --- Diff: core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala --- @@ -72,6 +72,22 @@ private[ui] class RDDOperationCluster(val id: String

[GitHub] spark pull request: [SPARK-14679] [UI] Fix UI DAG visualization OO...

2016-04-20 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12437#issuecomment-212483444 Thank you @srowen! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-20 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-212715572 @rxin, @cloud-fan, this PR works for both cases when the table is resolved. I think making MetastoreRelation a CatalogTable would certainly improve things, but it looks

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60612218 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60611545 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60611716 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -259,4 +261,78 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60676493 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60772391 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60772367 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -259,4 +261,78 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-213517254 @cloud-fan, I've rebased on master and fixed the two things you pointed out. Let me know if there's anything else. Thanks for reviewing! --- If your project

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-213611399 @cloud-fan, withSQLConf and withHadoopConf weren't working so I added withSessionConf to set the partitioning property correctly. --- If your project is set up f

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60816222 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-213630058 I'm not sure what's going on with the MiMa tests. Looks unrelated to my PR. I think all of the tests are passing again so this should be fine. Jenkins should

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-25 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-214566974 I noticed that there was a conflict so I rebased on master. Tests are still passing. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-29 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-215860434 @cloud-fan, I rebased on master to avoid the conflicts and tests are all passing. If you have a chance to take another look I'd appreciate it! I think this is

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-03 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-216605985 @rxin, could you take a look at this? I think it's close to being ready and I have a couple of follow-up improvements to Hive/Parquet support that depend on it.

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r62205224 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -212,4 +214,77 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r62215656 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -56,14 +58,30 @@ private[spark] class UnionPartition[T: ClassTag

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r62215672 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -56,14 +58,30 @@ private[spark] class UnionPartition[T: ClassTag

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r62215696 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -56,14 +58,30 @@ private[spark] class UnionPartition[T: ClassTag

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r62228103 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,8 +64,21 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r62238592 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,8 +64,22 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-217251801 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-05 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-217275958 @liancheng, I originally had the logic you suggest for the expected columns calculation. But, there were [test failures](https://amplab.cs.berkeley.edu/jenkins/job

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-05 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-217276204 @liancheng and @rxin, thank you for looking at this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-05-05 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-217288532 Thank you @andrewor14 and all the reviewers! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r62400576 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -347,10 +347,23 @@ case class

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-05-09 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-217896699 Thanks for reviewing this, @cloud-fan and @liancheng! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-09 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-217946972 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-09 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-217974405 @liancheng, @cloud-fan, this commit is a follow-up to #12239 that fixes column resolution when writing to both Hive MetastoreRelations and HadoopFsRelations. Could you

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r63908465 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -348,28 +348,41 @@ case class

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r63908615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -498,6 +499,117 @@ class Analyzer

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-19 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-220389967 @cloud-fan, @liancheng, thanks for reviewing! I've rebased on master and fixed your comments so far. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-15420] [SQL] Add repartition and sort t...

2016-05-19 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/13206 [SPARK-15420] [SQL] Add repartition and sort to prepare output data ## What changes were proposed in this pull request? * WriterContainer detects that the incoming logical plan has been

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r63971653 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -348,28 +348,41 @@ case class

[GitHub] spark pull request #13280: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-06-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13280#discussion_r67006656 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystSchemaConverter.scala --- @@ -538,6 +538,22 @@ private[parquet

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67749047 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13338 @tgravescs, I've updated it. Sorry about the delay, for some reason the notifications for this issue didn't make it to my inbox so I wasn't seeing updates. --- If your project i

[GitHub] spark issue #13482: [SPARK-15725][YARN] Ensure ApplicationMaster sleeps for ...

2016-06-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13482 @tgravescs, I've updated it. Sorry about the delay, for some reason the notifications for this issue didn't make it to my inbox so I wasn't seeing updates. --- If your project i

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13338 @tgravescs, I've updated it. Sorry about the delay, for some reason the notifications for this issue didn't make it to my inbox so I wasn't seeing updates. --- If your project i

[GitHub] spark issue #13482: [SPARK-15725][YARN] Ensure ApplicationMaster sleeps for ...

2016-06-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13482 @tgravescs, thanks for reviewing! Sorry about the delay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13338 Thanks @tgravescs! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13880: SPARK-16178: Remove unnecessary Hive partition ch...

2016-06-23 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/13880 SPARK-16178: Remove unnecessary Hive partition check. ## What changes were proposed in this pull request? This removes a check that partition names match from the Hive write path, which

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13701 @gatorsmile, sorry for the delay, I was evidently not getting notifications until I changed some settings yesterday. There are a few tests in Parquet that generate files with test data that

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13701 Yeah, Parquet doesn't make a distinction for where filters are applied. If you push a filter, then it will be applied to row groups if possible and individual rows after that. But if y

[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-06-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13389#discussion_r69158244 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala --- @@ -150,7 +150,8 @@ private[parquet

[GitHub] spark issue #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and binary tes...

2016-06-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13389 Looks fine other than one comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13701 @gatorsmile, we've not seen a penalty from running row group level tests when no row groups are filtered and we've decided to turn on dictionary filtering by default. You may see a pe

[GitHub] spark pull request #14093: SPARK-16420: Ensure compression streams are close...

2016-07-07 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/14093 SPARK-16420: Ensure compression streams are closed. ## What changes were proposed in this pull request? This uses the try/finally pattern to ensure streams are closed after use

[GitHub] spark pull request #14093: SPARK-16420: Ensure compression streams are close...

2016-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/14093#discussion_r69943044 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -349,12 +349,19 @@ void forceSorterToSpill() throws IOException

[GitHub] spark pull request #14093: SPARK-16420: Ensure compression streams are close...

2016-07-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/14093#discussion_r70105908 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -349,12 +349,19 @@ void forceSorterToSpill() throws IOException

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/14014 +1 Nice work, I think we may be able to do the same thing in parquet-avro as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #14149: [SPARK-16435][YARN][MINOR] Add warning log if ini...

2016-07-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/14149#discussion_r70469463 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2342,6 +2342,12 @@ private[spark] object Utils extends Logging { * Return the

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-02-17 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/11242 SPARK-9926: Parallelize partition logic in UnionRDD. This patch has the new logic from #8512 that uses a parallel collection to compute partitions in UnionRDD. The rest of #8512 added an

[GitHub] spark pull request: SPARK-13403: Pass hadoopConfiguration to HiveC...

2016-02-19 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/11273 SPARK-13403: Pass hadoopConfiguration to HiveConf constructors. This commit updates the HiveContext so that sc.hadoopConfiguration is used to instantiate its internal instances of HiveConf

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13236#issuecomment-221033268 I don't think a map preserves behavior. Hadoop `Configuration` instances have a set of final properties that can't be changed. This loses that informatio

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13236#issuecomment-221063988 @yhuai, Hive uses shims to be compatible with Hadoop 1 and Hadoop 2. I think it would be better to use the existing mechanism in Hive to deal with this. I know

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13236#issuecomment-221074125 Why is Hive's ClassLoader loading Hadoop classes itself rather than delegating to the ClassLoader that is responsible for Hadoop? Hive should be using shims to int

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13236#issuecomment-221083829 @yhuai, I know that the IsolatedClientLoader is used to load multiple versions of Hive, but Hive should be able to use the version of Hadoop that Spark has already

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-24 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-221300255 @liancheng, @cloud-fan, I rebased this on master since it had conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-24 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/13280 [SPARK-9876][SQL]: Update Parquet to 1.8.1. ## What changes were proposed in this pull request? This includes minimal changes to get Spark using the current release of Parquet, 1.8.1

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-24 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-221375636 @rxin, I agree that we shouldn't upgrade if there are perf regressions. I would like to know what they are so we can fix them in Parquet upstream though. This shou

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-24 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-221417279 I'm not sure what should be done to fix the dependency test failure. Looks like there's a list of dependencies that needs to be updated. Is that something

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-26 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-222014796 @liancheng, thanks for pointing out that fix, I've added it. I thought that was already committed since it has been a while since we fixed the Parquet

[GitHub] spark pull request: SPARK-13723: Change behavior of --num-executor...

2016-05-26 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/13338 SPARK-13723: Change behavior of --num-executors with dynamic allocation. ## What changes were proposed in this pull request? This changes the behavior of --num-executors and

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-26 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-222027030 @liancheng: rebased. Sorry I missed that earlier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-27 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-05450 @yhuai, I'll answer #2 first since it's quick: the column names are used to create a projection of the incoming data frame so any extra columns aren't s

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-27 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-13718 @liancheng, fixed. Yeah, IntelliJ has a few annoyances like that with scala. Imports are a mess. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13280#discussion_r64947459 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala --- @@ -1415,6 +1425,18 @@ class

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13280#discussion_r64948705 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala --- @@ -1415,6 +1425,18 @@ class

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-27 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/12313#issuecomment-31124 @yhuai, I'll add an option for strict checking. I agree with you that we need to have a holistic solution. It's also not ideal that some write metho

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-27 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13280#issuecomment-77478 Thanks @liancheng! It will be great to have predicate push-down for strings in 2.0! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-9876][SQL][FOLLOWUP] Enable string and ...

2016-05-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13389#discussion_r65089323 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -50,7 +50,6 @@ private[sql] object

[GitHub] spark pull request: [SPARK-9876][SQL][FOLLOWUP] Enable string and ...

2016-05-30 Thread rdblue
Github user rdblue commented on the pull request: https://github.com/apache/spark/pull/13389#issuecomment-222521471 +1 overall, good catch on those tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13280: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-06-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13280 @yhuai, what started failing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #13280: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-06-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13280 As I said on PR #13445: It sounds reasonable, but we should follow up on this. If we revert the change I suggest that we only revert it in 2.0 or add it to master as soon as 2.0 is branched. That

[GitHub] spark issue #13445: [SPARK-9876] Revert "[SPARK-9876][SQL] Update Parquet to...

2016-06-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13445 Sounds reasonable, but the "regression" wasn't located or even confirmed to exist after this change was reverted the last time. There was also no follow-up on it. If we revert the c

[GitHub] spark issue #13450: [SPARK-9876] [BRANCH-2.0] Revert "[SPARK-9876][SQL] Upda...

2016-06-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13450 +1, looks fine to me assuming tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2016-06-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/12313 @yhuai, I've removed the public API additions so we can get the changes in as you suggest. I also rebased on the current master so it can be merged. I'll fix any test failures that come

[GitHub] spark pull request #13482: SPARK-15725: Ensure ApplicationMaster sleeps for ...

2016-06-02 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/13482 SPARK-15725: Ensure ApplicationMaster sleeps for the min interval. ## What changes were proposed in this pull request? Update `ApplicationMaster` to sleep for at least the minimum

[GitHub] spark issue #13482: SPARK-15725: Ensure ApplicationMaster sleeps for the min...

2016-06-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13482 @yhuai, @rxin, we should consider this work-around for 2.0 if it isn't too late. We see a lot of apps fail because the driver and AM lock up. --- If your project is set up for it, you can rep

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r65743060 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -284,8 +284,128 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r65757163 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -284,8 +284,128 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r65758947 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -284,8 +284,128 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r65767128 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -284,8 +284,128 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r65776253 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -284,8 +284,128 @@ class InsertIntoHiveTableSuite extends

[GitHub] spark issue #13482: SPARK-15725: Ensure ApplicationMaster sleeps for the min...

2016-06-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13482 cc @vanzin @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13338: [SPARK-13723] [YARN] Change behavior of --num-exe...

2016-06-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13338#discussion_r66683769 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -519,9 +519,9 @@ object YarnSparkHadoopUtil { conf

[GitHub] spark pull request #13338: [SPARK-13723] [YARN] Change behavior of --num-exe...

2016-06-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13338#discussion_r66685237 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2262,21 +2262,39 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #13338: [SPARK-13723] [YARN] Change behavior of --num-exe...

2016-06-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13338#discussion_r66685637 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2262,21 +2262,39 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #13338: [SPARK-13723] [YARN] Change behavior of --num-exe...

2016-06-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/13338#discussion_r66685722 --- Diff: docs/configuration.md --- @@ -1158,6 +1158,10 @@ Apart from these, the following properties are also available, and may be useful For more

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13338 Thanks for reviewing, everyone! I've made some comments and will update once we have consensus on util methods and semantics. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2016-06-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/12313#discussion_r66687753 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -505,6 +506,117 @@ class Analyzer

<    8   9   10   11   12   13   14   >