spark git commit: [SPARK-14447][SQL] Speed up TungstenAggregate w/ keys using VectorizedHashMap

2016-04-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ff9ae61a3 -> b5c60bcdc [SPARK-14447][SQL] Speed up TungstenAggregate w/ keys using VectorizedHashMap ## What changes were proposed in this pull request? This patch speeds up group-by aggregates by around 3-5x by leveraging an in-memory

spark git commit: [SPARK-14782][SPARK-14778][SQL] Remove HiveConf dependency from HiveSqlAstBuilder

2016-04-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 90933e2af -> 804581411 [SPARK-14782][SPARK-14778][SQL] Remove HiveConf dependency from HiveSqlAstBuilder ## What changes were proposed in this pull request? The patch removes HiveConf dependency from HiveSqlAstBuilder. This is required

spark git commit: [SPARK-14674][SQL] Move HiveContext.hiveconf to HiveSessionState

2016-04-18 Thread yhuai
ted? Existing tests. Closes #12431 Author: Andrew Or <and...@databricks.com> Author: Yin Huai <yh...@databricks.com> Closes #12449 from yhuai/hiveconf. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f1a11976 T

spark git commit: [SPARK-14125][SQL] Native DDL Support: Alter View

2016-04-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f83ba454a -> 0d22092cd [SPARK-14125][SQL] Native DDL Support: Alter View What changes were proposed in this pull request? This PR is to provide a native DDL support for the following three Alter View commands: Based on the Hive DDL

spark git commit: [SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths

2016-05-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 22f9f5f97 -> dc1562e97 [SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths ## What changes were proposed in this pull request? Lets says there are json files in

spark git commit: [SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues related to lead and lag functions

2016-07-25 Thread yhuai
uai <yh...@databricks.com> Closes #14284 from yhuai/lead-lag. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815f3eec Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815f3eec Diff: http://git-wip-us.apache.org/

spark git commit: [SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues related to lead and lag functions

2016-07-25 Thread yhuai
Yin Huai <yh...@databricks.com> Closes #14284 from yhuai/lead-lag. (cherry picked from commit 815f3eece5f095919a329af8cbd762b9ed71c7a8) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [SPARK-16313][SQL][BRANCH-1.6] Spark should not silently drop exceptions in file listing

2016-07-14 Thread yhuai
How was this patch tested? Manually tested. **Note: This is a backport of https://github.com/apache/spark/pull/13987** Author: Yin Huai <yh...@databricks.com> Closes #14139 from yhuai/SPARK-16313-branch-1.6. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable

2016-07-21 Thread yhuai
ore stable. Author: Yin Huai <yh...@databricks.com> Closes #14289 from yhuai/SPARK-16656. (cherry picked from commit 9abd99b3c318d0ec8b91124d40f3ab9e9d835dcf) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wi

spark git commit: [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException

2016-07-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 a32531a72 -> 7d87fc964 [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException ## What changes were proposed in this pull request? We do not want SparkExceptions from job failures in the planning

spark git commit: [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException

2016-07-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2182e4322 -> bbc247548 [SPARK-16748][SQL] SparkExceptions during planning should not wrapped in TreeNodeException ## What changes were proposed in this pull request? We do not want SparkExceptions from job failures in the planning phase

spark git commit: [SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogColumn

2016-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 064d91ff7 -> 301fb0d72 [SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogColumn ## What changes were proposed in this pull request? `StructField` has very similar semantic with `CatalogColumn`, except that

spark git commit: [SPARK-16805][SQL] Log timezone when query result does not match

2016-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 301fb0d72 -> 579fbcf3b [SPARK-16805][SQL] Log timezone when query result does not match ## What changes were proposed in this pull request? It is useful to log the timezone when query result does not match, especially on build machines

spark git commit: [SPARK-16805][SQL] Log timezone when query result does not match

2016-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 d357ca302 -> c651ff53a [SPARK-16805][SQL] Log timezone when query result does not match ## What changes were proposed in this pull request? It is useful to log the timezone when query result does not match, especially on build

spark git commit: [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type

2016-08-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 639df046a -> b55f34370 [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type ## What changes were proposed in this pull request? Here is a table about the behaviours of

spark git commit: [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type

2016-08-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 969313bb2 -> 2daab33c4 [SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type ## What changes were proposed in this pull request? Here is a table about the behaviours of

spark git commit: [SPARK-17003][BUILD][BRANCH-1.6] release-build.sh is missing hive-thriftserver for scala 2.11

2016-08-12 Thread yhuai
tps://issues.apache.org/jira/browse/SPARK-8013). So, let's publish scala 2.11 artifacts with the flag of `-Phive-thfitserver`. I am also fixing the doc. Author: Yin Huai <yh...@databricks.com> Closes #14586 from yhuai/SPARK-16453-branch-1.6. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-16482][SQL] Describe Table Command for Tables Requiring Runtime Inferred Schema

2016-07-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fb2e8eeb0 -> c5ec87982 [SPARK-16482][SQL] Describe Table Command for Tables Requiring Runtime Inferred Schema What changes were proposed in this pull request? If we create a table pointing to a parquet/json datasets without

spark git commit: [SPARK-16482][SQL] Describe Table Command for Tables Requiring Runtime Inferred Schema

2016-07-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 9e3a59858 -> 550d0e7dc [SPARK-16482][SQL] Describe Table Command for Tables Requiring Runtime Inferred Schema What changes were proposed in this pull request? If we create a table pointing to a parquet/json datasets without

spark git commit: [SPARK-16515][SQL] set default record reader and writer for script transformation

2016-07-18 Thread yhuai
ark, but would fail now. ## How was this patch tested? added a test case in SQLQuerySuite. Closes #14169 Author: Daoyuan Wang <daoyuan.w...@intel.com> Author: Yin Huai <yh...@databricks.com> Closes #14249 from yhuai/scriptTransformation. Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-16515][SQL] set default record reader and writer for script transformation

2016-07-18 Thread yhuai
ark, but would fail now. ## How was this patch tested? added a test case in SQLQuerySuite. Closes #14169 Author: Daoyuan Wang <daoyuan.w...@intel.com> Author: Yin Huai <yh...@databricks.com> Closes #14249 from yhuai/scriptTransformation. (cherry pic

spark git commit: [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions

2016-07-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 81004f13f -> a804c9260 [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions aggregate expressions can only be executed inside `Aggregate`, if we propagate it up with constraints, the parent

spark git commit: [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions

2016-07-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 75a06aa25 -> cfa5ae84e [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions ## What changes were proposed in this pull request? aggregate expressions can only be executed inside `Aggregate`, if

spark git commit: [SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element"

2016-07-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e3cd5b305 -> e651900bd [SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element" ## What changes were proposed in this pull request? Due to backward-compatibility reasons, the following Parquet schema is

spark git commit: [SPARK-16351][SQL] Avoid per-record type dispatch in JSON when writing

2016-07-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8ea3f4eae -> 2877f1a52 [SPARK-16351][SQL] Avoid per-record type dispatch in JSON when writing ## What changes were proposed in this pull request? Currently, `JacksonGenerator.apply` is doing type-based dispatch for each row to write

spark git commit: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

2016-07-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e5fbb182c -> 1426a0805 [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update ## What changes were proposed in this pull request? This PR moves one and the last hard-coded Scala example snippet from the SQL programming guide into

spark git commit: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

2016-07-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 24ea87519 -> ef2a6f131 [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update ## What changes were proposed in this pull request? This PR moves one and the last hard-coded Scala example snippet from the SQL programming guide

spark git commit: [SPARK-16349][SQL] Fall back to isolated class loader when classes not found.

2016-07-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7f38b9d5f -> b4fbe140b [SPARK-16349][SQL] Fall back to isolated class loader when classes not found. Some Hadoop classes needed by the Hive metastore client jars are not present in Spark's packaging (for example,

spark git commit: [SPARK-12639][SQL] Mark Filters Fully Handled By Sources with *

2016-07-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9cc74f95e -> b1e5281c5 [SPARK-12639][SQL] Mark Filters Fully Handled By Sources with * ## What changes were proposed in this pull request? In order to make it clear which filters are fully handled by the underlying datasource we will mark

spark git commit: [SPARK-16181][SQL] outer join with isNull filter may return wrong result

2016-06-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0923c4f56 -> 1f2776df6 [SPARK-16181][SQL] outer join with isNull filter may return wrong result ## What changes were proposed in this pull request? The root cause is: the output attributes of outer join are derived from its children,

spark git commit: [SPARK-16181][SQL] outer join with isNull filter may return wrong result

2016-06-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 4c5e16f58 -> e68872f2e [SPARK-16181][SQL] outer join with isNull filter may return wrong result ## What changes were proposed in this pull request? The root cause is: the output attributes of outer join are derived from its children,

spark git commit: [SPARK-16453][BUILD] release-build.sh is missing hive-thriftserver for scala 2.10

2016-07-08 Thread yhuai
ted by release-build.sh. Author: Yin Huai <yh...@databricks.com> Closes #14108 from yhuai/SPARK-16453. (cherry picked from commit 60ba436b7010436c77dfe5219a9662accc25bffa) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-16453][BUILD] release-build.sh is missing hive-thriftserver for scala 2.10

2016-07-08 Thread yhuai
ase-build.sh. Author: Yin Huai <yh...@databricks.com> Closes #14108 from yhuai/SPARK-16453. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60ba436b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60ba436b Diff: h

spark git commit: [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values

2016-08-05 Thread yhuai
ich is jdbc:derby:;databaseName=metastore_db;create=true. This issue only shows up when `spark.sql.hive.metastore.jars` is not set to builtin. ## How was this patch tested? New test in HiveSparkSubmitSuite. Author: Yin Huai <yh...@databricks.com> Closes #14497 from yhuai/SPARK-16901. (cher

spark git commit: [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values

2016-08-05 Thread yhuai
ich is jdbc:derby:;databaseName=metastore_db;create=true. This issue only shows up when `spark.sql.hive.metastore.jars` is not set to builtin. ## How was this patch tested? New test in HiveSparkSubmitSuite. Author: Yin Huai <yh...@databricks.com> Closes #14497 from yhuai/SPARK-16901. Proj

spark git commit: [SPARK-16749][SQL] Simplify processing logic in LEAD/LAG processing.

2016-08-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 53d1c7877 -> df1065883 [SPARK-16749][SQL] Simplify processing logic in LEAD/LAG processing. ## What changes were proposed in this pull request? The logic for LEAD/LAG processing is more complex that it needs to be. This PR fixes that. ##

spark git commit: [SPARK-16828][SQL] remove MaxOf and MinOf

2016-08-01 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 03d46aafe -> 2eedc00b0 [SPARK-16828][SQL] remove MaxOf and MinOf ## What changes were proposed in this pull request? These 2 expressions are not needed anymore after we have `Greatest` and `Least`. This PR removes them and related tests.

spark git commit: [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0

2016-06-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d0eddb80e -> 6df8e3886 [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0 ## What changes were proposed in this pull request? Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration

spark git commit: [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0

2016-06-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 54aef1c14 -> 8159da20e [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0 ## What changes were proposed in this pull request? Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0

spark git commit: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up code clean up and improvement

2016-06-19 Thread yhuai
ull/13754/files and https://github.com/apache/spark/pull/13749. I will comment inline to explain my changes. ## How was this patch tested? Existing tests. Author: Yin Huai <yh...@databricks.com> Closes #13766 from yhuai/caseSensitivity. (cherry picked fr

spark git commit: [SPARK-16036][SPARK-16037][SPARK-16034][SQL] Follow up code clean up and improvement

2016-06-19 Thread yhuai
754/files and https://github.com/apache/spark/pull/13749. I will comment inline to explain my changes. ## How was this patch tested? Existing tests. Author: Yin Huai <yh...@databricks.com> Closes #13766 from yhuai/caseSensitivity. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-16656][SQL][BRANCH-1.6] Try to make CreateTableAsSelectSuite more stable

2016-08-16 Thread yhuai
ems it is a flaky test. This PR tries to make this test more stable. Author: Yin Huai <yh...@databricks.com> Closes #14668 from yhuai/SPARK-16656-branch1.6. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5c34029b Tree: http:

spark git commit: [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check

2016-08-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0b0c8b95e -> 928ca1c6d [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check ## What changes were proposed in this pull request? We use reflection to convert `TreeNode` to json string, and currently don't support arbitrary

spark git commit: [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check

2016-08-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 22c7660a8 -> 394d59866 [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check ## What changes were proposed in this pull request? We use reflection to convert `TreeNode` to json string, and currently don't support

spark git commit: [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check

2016-08-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 5c34029b8 -> 60de30faf [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check We use reflection to convert `TreeNode` to json string, and currently don't support arbitrary object. `UserDefinedGenerator` takes a function

spark git commit: [SPARK-19604][TESTS] Log the start of every Python test

2017-02-15 Thread yhuai
lso log the start of a test. So, if a test is hanging, we can tell which test file is running. ## How was this patch tested? This is a change for python tests. Author: Yin Huai <yh...@databricks.com> Closes #16935 from yhuai/SPARK-19604. (cherry picked fr

spark git commit: Update known_translations for contributor names

2017-01-18 Thread yhuai
uai <yh...@databricks.com> Closes #16628 from yhuai/known_translations. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c923185 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c923185 Diff: http://git-wip-us.a

spark git commit: [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread yhuai
ion of those downloaded jars when `spark.sql.hive.metastore.jars` is set to `maven`. ## How was this patch tested? jenkins Author: Yin Huai <yh...@databricks.com> Closes #16649 from yhuai/SPARK-19295. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

[2/2] spark git commit: [SPARK-16498][SQL] move hive hack for data source table into HiveExternalCatalog

2016-08-21 Thread yhuai
[SPARK-16498][SQL] move hive hack for data source table into HiveExternalCatalog ## What changes were proposed in this pull request? Spark SQL doesn't have its own meta store yet, and use hive's currently. However, hive's meta store has some limitations(e.g. columns can't be too many, not

[1/2] spark git commit: [SPARK-16498][SQL] move hive hack for data source table into HiveExternalCatalog

2016-08-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 91c239768 -> b2074b664 http://git-wip-us.apache.org/repos/asf/spark/blob/b2074b66/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala -- diff

spark git commit: [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister

2016-08-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 8f4cacd3a -> 45036327f [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister ## What changes were proposed in this pull request? Add an instruction to ask the user to remove or upgrade the

spark git commit: [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister

2016-08-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fffb0c0d1 -> 268b71d0d [SPARK-17065][SQL] Improve the error message when encountering an incompatible DataSourceRegister ## What changes were proposed in this pull request? Add an instruction to ask the user to remove or upgrade the

spark git commit: [MINOR][SQL] Fix some typos in comments and test hints

2016-08-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6f3cd36f9 -> 929cb8bee [MINOR][SQL] Fix some typos in comments and test hints ## What changes were proposed in this pull request? Fix some typos in comments and test hints ## How was this patch tested? N/A. Author: Sean Zhong

spark git commit: Revert "[SPARK-17369][SQL] MetastoreRelation toJSON throws AssertException due to missing otherCopyArgs"

2016-09-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 dd27530c7 -> f56b70fec Revert "[SPARK-17369][SQL] MetastoreRelation toJSON throws AssertException due to missing otherCopyArgs" This reverts commit 7b1aa2153bc6c8b753dba0710fe7b5d031158a34. Project:

spark git commit: [SPARK-17531][BACKPORT] Don't initialize Hive Listeners for the Execution Client

2016-09-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 047bc3f13 -> bf3f6d2f1 [SPARK-17531][BACKPORT] Don't initialize Hive Listeners for the Execution Client ## What changes were proposed in this pull request? If a user provides listeners inside the Hive Conf, the configuration for these

spark git commit: [SPARK-17531] Don't initialize Hive Listeners for the Execution Client

2016-09-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 b17f10ced -> c1426452b [SPARK-17531] Don't initialize Hive Listeners for the Execution Client ## What changes were proposed in this pull request? If a user provides listeners inside the Hive Conf, the configuration for these

spark git commit: [SPARK-17531] Don't initialize Hive Listeners for the Execution Client

2016-09-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4ba63b193 -> 72edc7e95 [SPARK-17531] Don't initialize Hive Listeners for the Execution Client ## What changes were proposed in this pull request? If a user provides listeners inside the Hive Conf, the configuration for these listeners

spark git commit: [SPARK-17652] Fix confusing exception message while reserving capacity

2016-09-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 cf5324127 -> 8a58f2e8e [SPARK-17652] Fix confusing exception message while reserving capacity ## What changes were proposed in this pull request? This minor patch fixes a confusing exception message while reserving additional

spark git commit: [SPARK-17652] Fix confusing exception message while reserving capacity

2016-09-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8135e0e5e -> 7c7586aef [SPARK-17652] Fix confusing exception message while reserving capacity ## What changes were proposed in this pull request? This minor patch fixes a confusing exception message while reserving additional capacity in

spark git commit: [SPARK-17699] Support for parsing JSON string columns

2016-09-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 027dea8f2 -> fe33121a5 [SPARK-17699] Support for parsing JSON string columns Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst others. This is

spark git commit: [SPARK-17758][SQL] Last returns wrong result in case of empty partition

2016-10-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 b8df2e53c -> 3b6463a79 [SPARK-17758][SQL] Last returns wrong result in case of empty partition ## What changes were proposed in this pull request? The result of the `Last` function can be wrong when the last partition processed is

spark git commit: [SPARK-17758][SQL] Last returns wrong result in case of empty partition

2016-10-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 221b418b1 -> 5fd54b994 [SPARK-17758][SQL] Last returns wrong result in case of empty partition ## What changes were proposed in this pull request? The result of the `Last` function can be wrong when the last partition processed is empty.

spark git commit: [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable

2016-09-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d5ec5dbb0 -> eb004c662 [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable ## What changes were proposed in this pull request? Hive confs in hive-site.xml will be loaded in `hadoopConf`, so we should use `hadoopConf` in

spark git commit: [SPARK-17549][SQL] Only collect table size stat in driver for cached relation.

2016-09-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b9323fc93 -> 39e2bad6a [SPARK-17549][SQL] Only collect table size stat in driver for cached relation. The existing code caches all stats for all columns for each partition in the driver; for a large relation, this causes extreme memory

spark git commit: [SPARK-17549][SQL] Only collect table size stat in driver for cached relation.

2016-09-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 5ad4395e1 -> 3fce1255a [SPARK-17549][SQL] Only collect table size stat in driver for cached relation. The existing code caches all stats for all columns for each partition in the driver; for a large relation, this causes extreme memory

spark git commit: [SPARK-17589][TEST][2.0] Fix test case `create external table` in MetastoreDataSourcesSuite

2016-09-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 ac060397c -> c4660d607 [SPARK-17589][TEST][2.0] Fix test case `create external table` in MetastoreDataSourcesSuite ### What changes were proposed in this pull request? This PR is to fix a test failure on the branch 2.0 builds:

spark git commit: [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable

2016-09-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 643f161d5 -> e76f4f47f [SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable ## What changes were proposed in this pull request? Hive confs in hive-site.xml will be loaded in `hadoopConf`, so we should use `hadoopConf`

spark git commit: [SPARK-17549][SQL] Revert "[] Only collect table size stat in driver for cached relation."

2016-09-20 Thread yhuai
ned at https://issues.apache.org/jira/browse/SPARK-17549?focusedCommentId=15505060=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15505060 Author: Yin Huai <yh...@databricks.com> Closes #15157 from yhuai/revert-SPARK-17549. (cherry picked from commit 9ac68dbc5720026ea92acc61d295c

spark git commit: [SPARK-17549][SQL] Revert "[] Only collect table size stat in driver for cached relation."

2016-09-20 Thread yhuai
es.apache.org/jira/browse/SPARK-17549?focusedCommentId=15505060=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15505060 Author: Yin Huai <yh...@databricks.com> Closes #15157 from yhuai/revert-SPARK-17549. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-17270][SQL] Move object optimization rules into its own file

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a6bca3ad0 -> cc0caa690 [SPARK-17270][SQL] Move object optimization rules into its own file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various Dataset object optimization

spark git commit: [SPARK-17266][TEST] Add empty strings to the regressionTests of PrefixComparatorsSuite

2016-08-26 Thread yhuai
y. But, let's this test case in the regressionTests. Author: Yin Huai <yh...@databricks.com> Closes #14837 from yhuai/SPARK-17266. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6bca3ad Tree: http://git-wip-us.apache.org/r

spark git commit: [SPARK-17187][SQL][FOLLOW-UP] improve document of TypedImperativeAggregate

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 28ab17922 -> 970ab8f6d [SPARK-17187][SQL][FOLLOW-UP] improve document of TypedImperativeAggregate ## What changes were proposed in this pull request? improve the document to make it easier to understand and also mention window operator.

spark git commit: [SPARK-17192][SQL] Issue Exception when Users Specify the Partitioning Columns without a Given Schema

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 188321623 -> fd4ba3f62 [SPARK-17192][SQL] Issue Exception when Users Specify the Partitioning Columns without a Given Schema ### What changes were proposed in this pull request? Address the comments by yhuai in the original PR: ht

spark git commit: [SPARK-17250][SQL] Remove HiveClient and setCurrentDatabase from HiveSessionCatalog

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fd4ba3f62 -> 261c55dd8 [SPARK-17250][SQL] Remove HiveClient and setCurrentDatabase from HiveSessionCatalog ### What changes were proposed in this pull request? This is the first step to remove `HiveClient` from `HiveSessionState`. In the

spark git commit: [SPARK-17260][MINOR] move CreateTables to HiveStrategies

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6063d5963 -> 28ab17922 [SPARK-17260][MINOR] move CreateTables to HiveStrategies ## What changes were proposed in this pull request? `CreateTables` rule turns a general `CreateTable` plan to `CreateHiveTableAsSelectCommand` for hive serde

spark git commit: [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f64a1ddd0 -> 540e91280 [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions ## What changes were proposed in this pull request? Given that non-deterministic expressions can be stateful, pushing them down the query

spark git commit: [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions

2016-08-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 dfdfc3092 -> 9c0ac6b53 [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions ## What changes were proposed in this pull request? Given that non-deterministic expressions can be stateful, pushing them down the

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

2016-08-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 79195982a -> 94eff0875 [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 ## What changes were proposed in this pull request? Collect GC discussion in one section, and documenting findings about G1 GC heap region

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

2016-08-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 209e1b3c0 -> 342278c09 [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 ## What changes were proposed in this pull request? Collect GC discussion in one section, and documenting findings about G1 GC heap region

spark git commit: [SPARK-17187][SQL] Supports using arbitrary Java object as internal aggregation buffer object

2016-08-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9b5a1d1d5 -> d96d15156 [SPARK-17187][SQL] Supports using arbitrary Java object as internal aggregation buffer object ## What changes were proposed in this pull request? This PR introduces an abstract class `TypedImperativeAggregate` so

spark git commit: [SPARK-18132] Fix checkstyle

2016-10-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master dd4f088c1 -> d3b4831d0 [SPARK-18132] Fix checkstyle This PR fixes checkstyle. Author: Yin Huai <yh...@databricks.com> Closes #15656 from yhuai/fix-format. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-18132] Fix checkstyle

2016-10-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 dcf2f090c -> 1a4be51d6 [SPARK-18132] Fix checkstyle This PR fixes checkstyle. Author: Yin Huai <yh...@databricks.com> Closes #15656 from yhuai/fix-format. (cherry picked from commit d3b4831d009905185ad74096ce3ecfa934bc191

[1/2] spark git commit: [SPARK-17970][SQL] store partition spec in metastore for data source table

2016-10-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 79fd0cc05 -> ccb115430 http://git-wip-us.apache.org/repos/asf/spark/blob/ccb11543/sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala

[2/2] spark git commit: [SPARK-17970][SQL] store partition spec in metastore for data source table

2016-10-27 Thread yhuai
[SPARK-17970][SQL] store partition spec in metastore for data source table ## What changes were proposed in this pull request? We should follow hive table and also store partition spec in metastore for data source table. This brings 2 benefits: 1. It's more flexible to manage the table data

spark git commit: [SPARK-18368][SQL] Fix regexp replace when serialized

2016-11-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 47636618a -> d4028de97 [SPARK-18368][SQL] Fix regexp replace when serialized ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then

spark git commit: [SPARK-18368][SQL] Fix regexp replace when serialized

2016-11-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 c8628e877 -> 6e7310590 [SPARK-18368][SQL] Fix regexp replace when serialized ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized

spark git commit: [SPARK-18338][SQL][TEST-MAVEN] Fix test case initialization order under Maven builds

2016-11-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 02c5325b8 -> 205e6d586 [SPARK-18338][SQL][TEST-MAVEN] Fix test case initialization order under Maven builds ## What changes were proposed in this pull request? Test case initialization order under Maven and SBT are different. Maven

spark git commit: [SPARK-18368][SQL] Fix regexp replace when serialized

2016-11-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.1 626f6d6d4 -> 80f58510a [SPARK-18368][SQL] Fix regexp replace when serialized ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized

spark git commit: Revert "[SPARK-18368] Fix regexp_replace with task serialization."

2016-11-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 06a13ecca -> 47636618a Revert "[SPARK-18368] Fix regexp_replace with task serialization." This reverts commit b9192bb3ffc319ebee7dbd15c24656795e454749. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-18167] Re-enable the non-flaky parts of SQLQuerySuite

2016-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 550cd56e8 -> 4cee2ce25 [SPARK-18167] Re-enable the non-flaky parts of SQLQuerySuite ## What changes were proposed in this pull request? It seems the proximate cause of the test failures is that `cast(str as decimal)` in derby will raise

spark git commit: [SPARK-18167] Re-enable the non-flaky parts of SQLQuerySuite

2016-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.1 e51978c3d -> 0a303a694 [SPARK-18167] Re-enable the non-flaky parts of SQLQuerySuite ## What changes were proposed in this pull request? It seems the proximate cause of the test failures is that `cast(str as decimal)` in derby will

spark git commit: [SPARK-18256] Improve the performance of event log replay in HistoryServer

2016-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4cee2ce25 -> 0e3312ee7 [SPARK-18256] Improve the performance of event log replay in HistoryServer ## What changes were proposed in this pull request? This patch significantly improves the performance of event log replay in the

spark git commit: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow users to specify database in destination table name(but have to be same as source table)

2016-10-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2629cd746 -> 4329c5cea [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow users to specify database in destination table name(but have to be same as source table) ## What changes were proposed in this pull request? Unlike Hive, in

spark git commit: [SPARK-17863][SQL] should not add column into Distinct

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 522dd0d0e -> da9aeb0fd [SPARK-17863][SQL] should not add column into Distinct ## What changes were proposed in this pull request? We are trying to resolve the attribute in sort by pulling up some column for grandchild into child, but

spark git commit: [SPARK-17863][SQL] should not add column into Distinct

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 d7fa3e324 -> c53b83749 [SPARK-17863][SQL] should not add column into Distinct ## What changes were proposed in this pull request? We are trying to resolve the attribute in sort by pulling up some column for grandchild into child, but

spark git commit: Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables"

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7ab86244e -> 522dd0d0e Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables" This reverts commit 7ab86244e30ca81eb4fa40ea77b4c2b8881cbab2. Project:

spark git commit: [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types

2016-10-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master c5fe3dd4f -> a21791e31 [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types ## What changes were proposed in this pull request? Binary operator requires its inputs to be of same type, but it

spark git commit: [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types

2016-10-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 1c1e847bc -> 7c8d9a557 [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types ## What changes were proposed in this pull request? Binary operator requires its inputs to be of same type, but it

spark git commit: [SPARK-17926][SQL][STREAMING] Added json for statuses

2016-10-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e371040a0 -> 7a531e305 [SPARK-17926][SQL][STREAMING] Added json for statuses ## What changes were proposed in this pull request? StreamingQueryStatus exposed through StreamingQueryListener often needs to be recorded (similar to

spark git commit: [SPARK-17926][SQL][STREAMING] Added json for statuses

2016-10-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 78458a7eb -> af2e6e0c9 [SPARK-17926][SQL][STREAMING] Added json for statuses ## What changes were proposed in this pull request? StreamingQueryStatus exposed through StreamingQueryListener often needs to be recorded (similar to

<    2   3   4   5   6   7   8   >