[GitHub] spark issue #14278: [SPARK-16632][SQL] Use Spark requested schema to guide v...

2016-07-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14278 Also cc @yhuai. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 Opened #14278 for the simpler yet more general fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14278: [SPARK-16632][SQL] Use Spark requested schema to guide v...

2016-07-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14278 @vanzin Could you please help verify this fix? The reason why #14272 works is that the Parquet requested schema is generated using `clipParquetSchema()`. --- If your project is set up

[GitHub] spark pull request #14278: [SPARK-16632][SQL] Use Spark requested schema to ...

2016-07-20 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14278 [SPARK-16632][SQL] Use Spark requested schema to guide vectorized Parquet reader initialization ## What changes were proposed in this pull request? In `SpecificParquetRecordReaderBase

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 Discussed with @yhuai, I'm also merging this to branch-2.0. @vanzin Thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 Would like to add that AFAIK byte and short are the only problematic types that we don't handle before this PR. Other Hive-Parquet schema conversion quirks like string (translated into `binary

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 I'm merging this to master. @yhuai Do we want this in branch-2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 This LGTM. Although it's a little bit hacky since technically the fields in requested schema passed to the Parquet record reader may have different original types (`INT_8` and `INT_16`) from

[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

2016-07-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14235 LGTM. One thing is that I feel most of the times the SQL comparison assertion may fail due to reasonable internal changes that somehow affect SQL generation in no harmful ways, and can

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14098 Yea, especially on case insensitive OS'es like Mac and Windows, the doc actually builds successfully even when cases of the example file names don't match. I guess that's probably why we missed

[GitHub] spark issue #14245: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example u...

2016-07-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14245 Reused JIRA number SPARK-16303 and renamed Scala/Java example file names. Python examples are not being updated to use the `include_example` tag yet. The PR (#14098) is still in WIP status

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14098 @wangmiao1981 I guess it's not ready yet. You may put a `[WIP]` tag in the PR title when it's in WIP status and remove it when it is ready for review. --- If your project is set up for it, you

[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...

2016-07-18 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14098#discussion_r71103329 --- Diff: docs/sql-programming-guide.md --- @@ -79,7 +79,7 @@ The entry point into all functionality in Spark is the [`SparkSession`](api/java

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14098 @wangmiao1981 Is this ready for review now? Also, please update the PR title to: ``` [SPARK-16380][SQL][EXAMPLE] Update SQL examples and programming guide for Python language binding

[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...

2016-07-18 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14098#discussion_r71103044 --- Diff: docs/sql-programming-guide.md --- @@ -79,7 +79,7 @@ The entry point into all functionality in Spark is the [`SparkSession`](api/java

[GitHub] spark pull request #14245: [MINOR][DOCS][EXAMPLES] Minor Scala example updat...

2016-07-18 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14245 [MINOR][DOCS][EXAMPLES] Minor Scala example update ## What changes were proposed in this pull request? This PR moves one and the last hard-coded Scala example snippet from the SQL

[GitHub] spark issue #14184: [SPARK-16529][SQL][TEST] `withTempDatabase` should set `...

2016-07-14 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14184 LGTM, merging to master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14106: [SPARK-16448] RemoveAliasOnlyProject should not remove a...

2016-07-14 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14106 Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...

2016-07-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14012 Thanks! Merged this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...

2016-07-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14098 Thanks for doing this! Overall it's pretty nice. A few high level comments: 1. It might be better to split the whole example file into several methods, as what #14119 did. In this way

[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14098#discussion_r70588560 --- Diff: examples/src/main/python/SparkSQLExample.py --- @@ -0,0 +1,208 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14098#discussion_r70588517 --- Diff: examples/src/main/python/SparkSQLExample.py --- @@ -0,0 +1,208 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14098#discussion_r70588473 --- Diff: examples/src/main/python/SparkSQLExample.py --- @@ -0,0 +1,208 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark issue #14106: [SPARK-16448] RemoveAliasOnlyProject should not remove a...

2016-07-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14106 LGTM except for some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70585442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70584787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70584778 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 LGTM, I've merged this to master and branch-2.0. Thanks for working on this! I only observed one weird rendering caused by the blank lines before `{% include_example %}`, maybe my local

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-12 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r70415596 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala --- @@ -260,7 +260,7 @@ private[parquet

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70264424 --- Diff: docs/sql-programming-guide.md --- @@ -1380,17 +949,17 @@ metadata. {% highlight scala %} -// spark is an existing

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 Can we add actual stdout output after each `.show()` call? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 This looks pretty good! Only found a few minor issues. Thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263832 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -41,43 +35,47 @@ object HiveFromSpark

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263783 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263767 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263718 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263743 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263553 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263573 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263657 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java --- @@ -0,0 +1,192 @@ +package

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263693 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/hive/JavaSparkHiveExample.java --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263602 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70263511 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70262914 --- Diff: docs/sql-programming-guide.md --- @@ -679,43 +435,7 @@ a `DataFrame` can be created programmatically with three steps. by `SparkSession

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70262964 --- Diff: docs/sql-programming-guide.md --- @@ -732,62 +452,7 @@ a `Dataset` can be created programmatically with three steps. by `SparkSession

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70261011 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70257720 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SparkSqlExample.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70256850 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java --- @@ -0,0 +1,192 @@ +package

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 Since you've added `JavaSparkSqlExample.scala`, we can remove `JavaSparkSQL.scala` now. (I guess that file was from my original WIP branch?) --- If your project is set up for it, you can reply

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70256340 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -41,43 +35,47 @@ object HiveFromSpark

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70245656 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70245539 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -41,43 +35,47 @@ object HiveFromSpark

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70245522 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -41,43 +35,47 @@ object HiveFromSpark

[GitHub] spark issue #14082: [SPARK-16381][SQL][SparkR] Update SQL examples and progr...

2016-07-11 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14082 Since both @shivaram and @felixcheung signed this off, I'm merging this to master and branch-2.0. Thanks @keypointt for working on this and @shivaram and @felixcheung for the review

[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14012#discussion_r70244000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1086,6 +1086,28 @@ object PruneFilters extends Rule

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70241185 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14014 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14014 @yhuai I found more test cases that may fail even with changes made in this PR. A proper fix is delivered in the last commit. Details about the new test case and the new fix can be found

[GitHub] spark issue #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` optimiz...

2016-07-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13765 Unfortunately I'm having network issue here and failed to fetch the branch from GitHub :( @cloud-fan Could you please help merge this one? Thanks. --- If your project is set up

[GitHub] spark issue #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` optimiz...

2016-07-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13765 Thanks for the examples. This makes sense and LGTM now. Merging into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-08 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r70033576 --- Diff: python/pyspark/sql/dataframe.py --- @@ -451,10 +451,10 @@ def repartition(self, numPartitions, *cols): +---+-+ |age

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-08 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r70033421 --- Diff: python/pyspark/sql/dataframe.py --- @@ -451,10 +451,10 @@ def repartition(self, numPartitions, *cols): +---+-+ |age

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-08 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r70032947 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -370,8 +370,11 @@ package object dsl { case plan

[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...

2016-07-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14012 One more thing, please complete the PR title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...

2016-07-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14012 LGTM except for some minor comments. Thanks for improving this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...

2016-07-08 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14012#discussion_r70032565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1135,11 +1146,16 @@ object PushDownPredicate

[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...

2016-07-08 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14012#discussion_r70032005 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1106,21 +1106,32 @@ object PushDownPredicate

[GitHub] spark issue #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` optimiz...

2016-07-07 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13765 Under what circumstances will a user use 2 or more adjacent re-partitioning operators? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-07 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r69930648 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -537,12 +537,19 @@ object CollapseProject extends

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-07 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r69930213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -370,8 +370,11 @@ package object dsl { case plan

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69928758 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69928094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,145 @@ case class StringRPad

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69927073 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,145 @@ case class StringRPad

[GitHub] spark issue #14076: [SPARK-16400][SQL] Remove InSet filter pushdown from Par...

2016-07-07 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14076 LGTM. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14082: [SPARK-16381][SQL][SparkR] Update SQL examples and progr...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14082 @shivaram @mengxr It would be nice if any of you can help review this one, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14028: [SPARK-16351][SQL] Avoid record-per type dispatch in JSO...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14028 LGTM, but I'd like to let @yhuai to sign off. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14028: [SPARK-16351][SQL] Avoid record-per type dispatch in JSO...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14028 Nit: "record-per" in PR title should be "per-record". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark issue #14070: [SPARK-15979][SQL] Renames CatalystWriteSupport to Parqu...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14070 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14070: [SPARK-15979][SQL] Renames CatalystWriteSupport t...

2016-07-06 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14070 [SPARK-15979][SQL] Renames CatalystWriteSupport to ParquetWriteSupport ## What changes were proposed in this pull request? PR #13696 renamed various Parquet support classes but left

[GitHub] spark issue #14067: [SPARK-16371][SQL] Do not push down filters incorrectly ...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14067 Yea, currently Spark SQL doesn't support column pruning and/or filter push-down for nested fields. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r69707873 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -482,13 +482,105 @@ private[parquet

[GitHub] spark pull request #14067: [SPARK-16371][SQL] Do not push down filters incor...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14067#discussion_r69707725 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -188,7 +188,7 @@ private[sql] object

[GitHub] spark issue #14067: [SPARK-16371][SQL] Do not push down filters incorrectly ...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14067 LGTM pending Jenkins. 2.0.0 RC2 has already been cut. We may have this in 2.0.0 if there was another RC. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14038 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706488 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +229,15 @@ trait FileFormat

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706413 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -156,3 +162,10 @@ class ListingFileCatalog

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706379 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-06 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r69700045 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -482,13 +482,104 @@ private[parquet

[GitHub] spark issue #14061: [SPARK-16388][SQL] Remove spark.sql.nativeView and spark...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14061 LGTM, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13926 @cloud-fan Probably not? `CreateHiveTableAsSelectCommand` uses `InsertIntoTable`, which is translated into `InsertIntoHiveTable`, which requires a `MetastoreRelation`. --- If your project

[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13926 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14064: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14064 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14014 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14014 The last build failure seems to be caused by flaky tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r69675124 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -482,13 +482,106 @@ private[parquet

[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13756#discussion_r69541353 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -206,7 +207,39 @@ private[sql] case class PreWriteCheck

[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13756#discussion_r69541383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -206,7 +207,39 @@ private[sql] case class PreWriteCheck

<    1   2   3   4   5   6   7   8   9   10   >