spark git commit: [SPARK-22120][SQL] TestHiveSparkSession.reset() should clean out Hive warehouse directory

2017-09-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 9836ea19f -> b0f30b56a [SPARK-22120][SQL] TestHiveSparkSession.reset() should clean out Hive warehouse directory ## What changes were proposed in this pull request? During TestHiveSparkSession.reset(), which is called after each

spark git commit: [SPARK-22120][SQL] TestHiveSparkSession.reset() should clean out Hive warehouse directory

2017-09-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 038b18573 -> ce204780e [SPARK-22120][SQL] TestHiveSparkSession.reset() should clean out Hive warehouse directory ## What changes were proposed in this pull request? During TestHiveSparkSession.reset(), which is called after each

spark git commit: [SPARK-22103] Move HashAggregateExec parent consume to a separate function in codegen

2017-09-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2c5b9b117 -> 038b18573 [SPARK-22103] Move HashAggregateExec parent consume to a separate function in codegen ## What changes were proposed in this pull request? HashAggregateExec codegen uses two paths for fast hash table and a generic

spark git commit: [SPARK-22100][SQL] Make percentile_approx support date/timestamp type and change the output type to be the same as input type

2017-09-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 20adf9aa1 -> 365a29bdb [SPARK-22100][SQL] Make percentile_approx support date/timestamp type and change the output type to be the same as input type ## What changes were proposed in this pull request? The `percentile_approx` function

spark git commit: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTruncateTable() method in AggregatedDialect

2017-09-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4a8c9e29b -> 2274d84ef [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTruncateTable() method in AggregatedDialect ## What changes were proposed in this pull request? The implemented `isCascadingTruncateTable` in `AggregatedDialect`

spark git commit: [SPARK-22110][SQL][DOCUMENTATION] Add usage and improve documentation with arguments and examples for trim function

2017-09-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c792aff03 -> 4a8c9e29b [SPARK-22110][SQL][DOCUMENTATION] Add usage and improve documentation with arguments and examples for trim function ## What changes were proposed in this pull request? This PR proposes to enhance the documentation

spark git commit: [SPARK-21998][SQL] SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning

2017-09-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5ac96854c -> 5960686e7 [SPARK-21998][SQL] SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning ## What changes were proposed in this pull request? Right now the calculation of SortMergeJoinExec's

spark git commit: [SPARK-22088][SQL] Incorrect scalastyle comment causes wrong styles in stringExpressions

2017-09-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f7ad0dbd5 -> 9cac249fd [SPARK-22088][SQL] Incorrect scalastyle comment causes wrong styles in stringExpressions ## What changes were proposed in this pull request? There is an incorrect `scalastyle:on` comment in

spark git commit: [SPARK-22076][SQL][FOLLOWUP] Expand.projections should not be a Stream

2017-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 55d5fa79d -> 352bea545 [SPARK-22076][SQL][FOLLOWUP] Expand.projections should not be a Stream ## What changes were proposed in this pull request? This a follow-up of https://github.com/apache/spark/pull/19289 , we missed another place:

spark git commit: [SPARK-22076][SQL] Expand.projections should not be a Stream

2017-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 6764408f6 -> 5d10586a0 [SPARK-22076][SQL] Expand.projections should not be a Stream ## What changes were proposed in this pull request? Spark with Scala 2.10 fails with a group by cube: ``` spark.range(1).select($"id" as "a", $"id" as

spark git commit: [SPARK-22076][SQL] Expand.projections should not be a Stream

2017-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e17901d6d -> ce6a71e01 [SPARK-22076][SQL] Expand.projections should not be a Stream ## What changes were proposed in this pull request? Spark with Scala 2.10 fails with a group by cube: ``` spark.range(1).select($"id" as "a", $"id" as

spark git commit: [SPARK-19318][SPARK-22041][SPARK-16625][BACKPORT-2.1][SQL] Docker test case failure: `: General data types to be mapped to Oracle`

2017-09-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 30ce056d8 -> 56865a1e9 [SPARK-19318][SPARK-22041][SPARK-16625][BACKPORT-2.1][SQL] Docker test case failure: `: General data types to be mapped to Oracle` ## What changes were proposed in this pull request? This PR is backport of

spark git commit: [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable

2017-09-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d5aefa83a -> ee13f3e3d [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable ## What changes were proposed in this pull request? Tables in the catalog cache are not invalidated once their statistics are updated. As a

spark git commit: [SPARK-21338][SQL] implement isCascadingTruncateTable() method in AggregatedDialect

2017-09-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2f962422a -> d5aefa83a [SPARK-21338][SQL] implement isCascadingTruncateTable() method in AggregatedDialect ## What changes were proposed in this pull request? org.apache.spark.sql.jdbc.JdbcDialect's method: def

spark git commit: [SPARK-14878][SQL] Trim characters string function support

2017-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3b049abf1 -> c66d64b3d [SPARK-14878][SQL] Trim characters string function support What changes were proposed in this pull request? This PR enhances the TRIM function support in Spark SQL by allowing the specification of trim

spark git commit: [SPARK-22003][SQL] support array column in vectorized reader with UDF

2017-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 894a7561d -> 3b049abf1 [SPARK-22003][SQL] support array column in vectorized reader with UDF ## What changes were proposed in this pull request? The UDF needs to deserialize the `UnsafeRow`. When the column type is Array, the `get`

spark git commit: [SPARK-21987][SQL] fix a compatibility issue of sql event logs

2017-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4decedfdb -> 3c6198c86 [SPARK-21987][SQL] fix a compatibility issue of sql event logs ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/18600 we removed the `metadata` field from `SparkPlanInfo`.

spark git commit: [SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields.

2017-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 22b111ef9 -> 4decedfdb [SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/18266 add a new feature to support read

spark git commit: [MINOR][SQL] Only populate type metadata for required types such as CHAR/VARCHAR.

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8be7e6bb3 -> dcbb22943 [MINOR][SQL] Only populate type metadata for required types such as CHAR/VARCHAR. ## What changes were proposed in this pull request? When reading column descriptions from hive catalog, we currently populate the

spark git commit: [SPARK-21973][SQL] Add an new option to filter queries in TPC-DS

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 17edfec59 -> 8be7e6bb3 [SPARK-21973][SQL] Add an new option to filter queries in TPC-DS ## What changes were proposed in this pull request? This pr added a new option to filter TPC-DS queries to run in `TPCDSQueryBenchmark`. By default,

spark git commit: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8c7e19a37 -> 17edfec59 [SPARK-20427][SQL] Read JDBC table use custom schema ## What changes were proposed in this pull request? Auto generated Oracle schema some times not we expect: - `number(1)` auto mapped to BooleanType, some times

spark git commit: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 21c4450fb -> 8c7e19a37 [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala ## What changes were proposed in this pull request? The code is already merged to master: https://github.com/apache/spark/pull/18975 This is a following

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 b606dc177 -> 3a692e355 [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-21980 This PR fixes the

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

2017-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b6ef1f57b -> 21c4450fb [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-21980 This PR fixes the

spark git commit: [SPARK-21979][SQL] Improve QueryPlanConstraints framework

2017-09-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c5f9b89dd -> 1a9857476 [SPARK-21979][SQL] Improve QueryPlanConstraints framework ## What changes were proposed in this pull request? Improve QueryPlanConstraints framework, make it robust and simple. In

spark git commit: [SPARK-17642][SQL] support DESC EXTENDED/FORMATTED table column commands

2017-09-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 957558235 -> 515910e9b [SPARK-17642][SQL] support DESC EXTENDED/FORMATTED table column commands ## What changes were proposed in this pull request? Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. Support DESC EXTENDED |

spark git commit: [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file

2017-09-10 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 520d92a19 -> 6273a711b [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? ``` echo '{"field": 1} {"field": 2} {"field": "3"}'

[2/2] spark git commit: [SPARK-4131] Support "Writing data into the filesystem from queries"

2017-09-09 Thread lixiao
[SPARK-4131] Support "Writing data into the filesystem from queries" ## What changes were proposed in this pull request? This PR implements the sql feature: INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] SELECT ... FROM ... ## How was this

[1/2] spark git commit: [SPARK-4131] Support "Writing data into the filesystem from queries"

2017-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e4d8f9a36 -> f76790557 http://git-wip-us.apache.org/repos/asf/spark/blob/f7679055/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala -- diff --git

spark git commit: [MINOR][SQL] Correct DataFrame doc.

2017-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6b45d7e94 -> e4d8f9a36 [MINOR][SQL] Correct DataFrame doc. ## What changes were proposed in this pull request? Correct DataFrame doc. ## How was this patch tested? Only doc change, no tests. Author: Yanbo Liang

spark git commit: [SPARK-21941] Stop storing unused attemptId in SQLTaskMetrics

2017-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 31c74fec2 -> 8a5eb5068 [SPARK-21941] Stop storing unused attemptId in SQLTaskMetrics ## What changes were proposed in this pull request? In a driver heap dump containing 390,105 instances of SQLTaskMetrics this would have saved me

spark git commit: [SPARK-21946][TEST] fix flaky test: "alter table: rename cached table" in InMemoryCatalogedDDLSuite

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 08cb06af2 -> 9ae7c96ce [SPARK-21946][TEST] fix flaky test: "alter table: rename cached table" in InMemoryCatalogedDDLSuite ## What changes were proposed in this pull request? This PR fixes flaky test `InMemoryCatalogedDDLSuite "alter

spark git commit: [SPARK-21946][TEST] fix flaky test: "alter table: rename cached table" in InMemoryCatalogedDDLSuite

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0dfc1ec59 -> 8a4f228dc [SPARK-21946][TEST] fix flaky test: "alter table: rename cached table" in InMemoryCatalogedDDLSuite ## What changes were proposed in this pull request? This PR fixes flaky test `InMemoryCatalogedDDLSuite "alter

spark git commit: [SPARK-21936][SQL][2.2] backward compatibility test framework for HiveExternalCatalog

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 781a1f83c -> 08cb06af2 [SPARK-21936][SQL][2.2] backward compatibility test framework for HiveExternalCatalog backport https://github.com/apache/spark/pull/19148 to 2.2 Author: Wenchen Fan Closes #19163 from

spark git commit: [SPARK-21936][SQL] backward compatibility test framework for HiveExternalCatalog

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6e37524a1 -> dbb824125 [SPARK-21936][SQL] backward compatibility test framework for HiveExternalCatalog ## What changes were proposed in this pull request? `HiveExternalCatalog` is a semi-public interface. When creating tables,

spark git commit: [SPARK-21726][SQL] Check for structural integrity of the plan in Optimzer in test mode.

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f62b20f39 -> 6e37524a1 [SPARK-21726][SQL] Check for structural integrity of the plan in Optimzer in test mode. ## What changes were proposed in this pull request? We have many optimization rules now in `Optimzer`. Right now we don't have

spark git commit: [SPARK-21949][TEST] Tables created in unit tests should be dropped after use

2017-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 57bc1e9eb -> f62b20f39 [SPARK-21949][TEST] Tables created in unit tests should be dropped after use ## What changes were proposed in this pull request? Tables should be dropped after use in unit tests. ## How was this patch tested? N/A

spark git commit: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadata from SQLConf and docs

2017-09-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b9ab791a9 -> e00f1a1da [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadata from SQLConf and docs ## What changes were proposed in this pull request? Since [SPARK-15639](https://github.com/apache/spark/pull/13701),

spark git commit: [SPARK-21912][SQL] ORC/Parquet table should not create invalid column names

2017-09-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ce7293c15 -> eea2b877c [SPARK-21912][SQL] ORC/Parquet table should not create invalid column names ## What changes were proposed in this pull request? Currently, users meet job abortions while creating or altering ORC/Parquet tables with

spark git commit: [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery should not produce unresolved query plans

2017-09-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master aad212547 -> ce7293c15 [SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery should not produce unresolved query plans ## What changes were proposed in this pull request? This is a follow-up of #19050 to deal with `ExistenceJoin` case.

spark git commit: [SPARK-21835][SQL] RewritePredicateSubquery should not produce unresolved query plans

2017-09-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 64936c14a -> f2e22aebf [SPARK-21835][SQL] RewritePredicateSubquery should not produce unresolved query plans ## What changes were proposed in this pull request? Correlated predicate subqueries are rewritten into `Join` by the rule

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

2017-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 1f7c4869b -> 7da8fbf08 [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources ## What changes were proposed in this pull request? All built-in data sources support `Partition Discovery`. We had

spark git commit: [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources

2017-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fd60d4fa6 -> 9e451bcf3 [MINOR][DOC] Update `Partition Discovery` section to enumerate all available file sources ## What changes were proposed in this pull request? All built-in data sources support `Partition Discovery`. We had better

spark git commit: [SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation

2017-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8c954d2cd -> fd60d4fa6 [SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation ## What changes were proposed in this pull request? For the given example below, the predicate added by

spark git commit: [SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions configurable

2017-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 02a4386ae -> 2974406d1 [SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions configurable ## What changes were proposed in this pull request? We should make codegen fallback of expressions configurable. So far, it is always

spark git commit: [SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE

2017-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ca59445ad -> 4e7a29efd [SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE ## What changes were proposed in this pull request? Currently, `withDatabase` fails if the database is not empty. It would be great if we

spark git commit: [SPARK-21654][SQL] Complement SQL predicates expression description

2017-09-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 07fd68a29 -> 9f30d9280 [SPARK-21654][SQL] Complement SQL predicates expression description ## What changes were proposed in this pull request? SQL predicates don't have complete expression description. This patch goes to complement the

spark git commit: [SPARK-21891][SQL] Add TBLPROPERTIES to DDL statement: CREATE TABLE USING

2017-09-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 900f14f6f -> acb7fed23 [SPARK-21891][SQL] Add TBLPROPERTIES to DDL statement: CREATE TABLE USING ## What changes were proposed in this pull request? Add `TBLPROPERTIES` to the DDL statement `CREATE TABLE USING`. After this change, the DDL

spark git commit: [SPARK-21884][SPARK-21477][BACKPORT-2.2][SQL] Mark LocalTableScanExec's input data transient

2017-09-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 14054ffc5 -> 50f86e1fe [SPARK-21884][SPARK-21477][BACKPORT-2.2][SQL] Mark LocalTableScanExec's input data transient This PR is to backport https://github.com/apache/spark/pull/18686 for resolving the issue in

spark git commit: [SPARK-21895][SQL] Support changing database in HiveClient

2017-09-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 12ab7f7e8 -> aba9492d2 [SPARK-21895][SQL] Support changing database in HiveClient ## What changes were proposed in this pull request? Supporting moving tables across different database in HiveClient `alterTable` ## How was this patch

spark git commit: [SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities

2017-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7ce110828 -> cba69aeb4 [SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities ## What changes were proposed in this pull request? Allows `BinaryComparison` operators to work on any data type

spark git commit: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown rule for Union

2017-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 501370d9d -> 7ce110828 [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown rule for Union ## What changes were proposed in this pull request? Also remove useless function `partitionByDeterministic` after the changes of

spark git commit: [SPARK-21583][HOTFIX] Removed intercept in test causing failures

2017-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fc45c2c88 -> 501370d9d [SPARK-21583][HOTFIX] Removed intercept in test causing failures Removing a check in the ColumnarBatchSuite that depended on a Java assertion. This assertion is being compiled out in the Maven builds causing the

spark git commit: [SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create…

2017-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 19b0240d4 -> 9696580c3 [SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create… … Dataset with LogicalRDD logical operator ## What changes were proposed in this pull request? Reusing

spark git commit: [SPARK-21878][SQL][TEST] Create SQLMetricsTestUtils

2017-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 964b507c7 -> 19b0240d4 [SPARK-21878][SQL][TEST] Create SQLMetricsTestUtils ## What changes were proposed in this pull request? Creates `SQLMetricsTestUtils` for the utility functions of both Hive-specific and the other SQLMetrics test

spark git commit: [MINOR][SQL][TEST] Test shuffle hash join while is not expected

2017-08-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 32d6d9d72 -> 235d28333 [MINOR][SQL][TEST] Test shuffle hash join while is not expected ## What changes were proposed in this pull request? igore("shuffle hash join") is to shuffle hash join to test _case class ShuffledHashJoinExec_. But

spark git commit: Revert "[SPARK-21845][SQL] Make codegen fallback of expressions configurable"

2017-08-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4133c1b0a -> 32d6d9d72 Revert "[SPARK-21845][SQL] Make codegen fallback of expressions configurable" This reverts commit 3d0e174244bc293f11dff0f11ef705ba6cd5fe3a. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-21845][SQL] Make codegen fallback of expressions configurable

2017-08-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fba9cc846 -> 3d0e17424 [SPARK-21845][SQL] Make codegen fallback of expressions configurable ## What changes were proposed in this pull request? We should make codegen fallback of expressions configurable. So far, it is always on. We might

spark git commit: [SPARK-21255][SQL] simplify encoder for java enum

2017-08-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8fcbda9c9 -> 6327ea570 [SPARK-21255][SQL] simplify encoder for java enum ## What changes were proposed in this pull request? This is a follow-up for https://github.com/apache/spark/pull/18488, to simplify the code. The major change is,

spark git commit: [SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-defined functions

2017-08-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 32fa0b814 -> 8fcbda9c9 [SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-defined functions ## What changes were proposed in this pull request? Add trait UserDefinedExpression to identify user-defined functions. UDF can

spark git commit: [SPARK-21831][TEST] Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1a598d717 -> 522e1f80d [SPARK-21831][TEST] Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite ## What changes were proposed in this pull request? [SPARK-19025](https://github.com/apache/spark/pull/16869) removes

spark git commit: [SPARK-21837][SQL][TESTS] UserDefinedTypeSuite Local UDTs not actually testing what it intends

2017-08-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 51620e288 -> 1a598d717 [SPARK-21837][SQL][TESTS] UserDefinedTypeSuite Local UDTs not actually testing what it intends ## What changes were proposed in this pull request? Adjust Local UDTs test to assert about results, and fix index of

spark git commit: [SPARK-21756][SQL] Add JSON option to allow unquoted control characters

2017-08-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 628bdeabd -> 51620e288 [SPARK-21756][SQL] Add JSON option to allow unquoted control characters ## What changes were proposed in this pull request? This patch adds allowUnquotedControlChars option in JSON data source to allow JSON Strings

spark git commit: [SPARK-21832][TEST] Merge SQLBuilderTest into ExpressionSQLBuilderSuite

2017-08-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master de7af295c -> 1f24ceee6 [SPARK-21832][TEST] Merge SQLBuilderTest into ExpressionSQLBuilderSuite ## What changes were proposed in this pull request? After [SPARK-19025](https://github.com/apache/spark/pull/16869), there is no need to keep

spark git commit: [SPARK-21830][SQL] Bump ANTLR version and fix a few issues.

2017-08-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 763b83ee8 -> 05af2de0f [SPARK-21830][SQL] Bump ANTLR version and fix a few issues. ## What changes were proposed in this pull request? This PR bumps the ANTLR version to 4.7, and fixes a number of small parser related issues uncovered by

spark git commit: [SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not throw NPE

2017-08-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.0 9f670ce5d -> bf1f30d7d [SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not throw NPE backport https://github.com/apache/spark/pull/19036 to branch 2.1 and 2.0 Author: Wenchen Fan Closes #19040

spark git commit: [SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not throw NPE

2017-08-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 3d3be4dca -> 576975356 [SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not throw NPE backport https://github.com/apache/spark/pull/19036 to branch 2.1 and 2.0 Author: Wenchen Fan Closes #19040

spark git commit: [SPARK-21807][SQL] Override ++ operation in ExpressionSet to reduce clone time

2017-08-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6942aeeb0 -> b8aaef49f [SPARK-21807][SQL] Override ++ operation in ExpressionSet to reduce clone time ## What changes were proposed in this pull request? The getAliasedConstraints fuction in LogicalPlan.scala will clone the expression

spark git commit: [SPARK-21603][SQL][FOLLOW-UP] Change the default value of maxLinesPerFunction into 4000

2017-08-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1662e9311 -> 6942aeeb0 [SPARK-21603][SQL][FOLLOW-UP] Change the default value of maxLinesPerFunction into 4000 ## What changes were proposed in this pull request? This pr changed the default value of `maxLinesPerFunction` into `4000`. In

spark git commit: [SPARK-21499][SQL] Support creating persistent function for Spark UDAF(UserDefinedAggregateFunction)

2017-08-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3ed1ae100 -> 43d71d965 [SPARK-21499][SQL] Support creating persistent function for Spark UDAF(UserDefinedAggregateFunction) ## What changes were proposed in this pull request? This PR is to enable users to create persistent Scala UDAF

spark git commit: [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore.

2017-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 0f640e96c -> 526087f9e [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore. For Hive tables, the current "replace the schema" code is the correct path, except that an exception in that path should

spark git commit: [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore.

2017-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ba843292e -> 84b5b16ea [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore. For Hive tables, the current "replace the schema" code is the correct path, except that an exception in that path should result

spark git commit: [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdown verification back.

2017-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 988b84d7e -> ba843292e [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdown verification back. ## What changes were proposed in this pull request? The previous PR(https://github.com/apache/spark/pull/19000) removed filter pushdown

spark git commit: [SPARK-21790][TESTS] Fix Docker-based Integration Test errors.

2017-08-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 10be01848 -> 72b738d8d [SPARK-21790][TESTS] Fix Docker-based Integration Test errors. ## What changes were proposed in this pull request?

spark git commit: [SPARK-21743][SQL][FOLLOW-UP] top-most limit should not cause memory leak

2017-08-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 23ea89808 -> 7880909c4 [SPARK-21743][SQL][FOLLOW-UP] top-most limit should not cause memory leak ## What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/18955 , to fix a bug that we

spark git commit: [SPARK-21213][SQL] Support collecting partition-level statistics: rowCount and sizeInBytes

2017-08-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 07a2b8738 -> 23ea89808 [SPARK-21213][SQL] Support collecting partition-level statistics: rowCount and sizeInBytes ## What changes were proposed in this pull request? Added support for ANALYZE TABLE [db_name].tablename PARTITION

spark git commit: [SPARK-21739][SQL] Cast expression should initialize timezoneId when it is called statically to convert something into TimestampType

2017-08-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 2a9697593 -> fdea642db [SPARK-21739][SQL] Cast expression should initialize timezoneId when it is called statically to convert something into TimestampType ## What changes were proposed in this pull request?

spark git commit: [SPARK-21739][SQL] Cast expression should initialize timezoneId when it is called statically to convert something into TimestampType

2017-08-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2caaed970 -> 310454be3 [SPARK-21739][SQL] Cast expression should initialize timezoneId when it is called statically to convert something into TimestampType ## What changes were proposed in this pull request?

spark git commit: [SPARK-21767][TEST][SQL] Add Decimal Test For Avro in VersionSuite

2017-08-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7ab951885 -> 2caaed970 [SPARK-21767][TEST][SQL] Add Decimal Test For Avro in VersionSuite ## What changes were proposed in this pull request? Decimal is a logical type of AVRO. We need to ensure the support of Hive's AVRO serde works well

spark git commit: [SPARK-21677][SQL] json_tuple throws NullPointException when column is null as string type

2017-08-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bfdc361ed -> 7ab951885 [SPARK-21677][SQL] json_tuple throws NullPointException when column is null as string type ## What changes were proposed in this pull request? ``` scala scala> Seq(("""{"Hyukjin": 224, "John":

spark git commit: [SQL][MINOR][TEST] Set spark.unsafe.exceptionOnMemoryLeak to true

2017-08-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b83b502c4 -> ae9e42479 [SQL][MINOR][TEST] Set spark.unsafe.exceptionOnMemoryLeak to true ## What changes were proposed in this pull request? When running IntelliJ, we are unable to capture the exception of memory leak detection. >

spark git commit: [SPARK-21743][SQL] top-most limit should not cause memory leak

2017-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b8ffb5105 -> a45133b82 [SPARK-21743][SQL] top-most limit should not cause memory leak ## What changes were proposed in this pull request? For top-most limit, we will use a special operator to execute it: `CollectLimitExec`.

spark git commit: [SPARK-21738] Thriftserver doesn't cancel jobs when session is closed

2017-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1cce1a3b6 -> 7add4e982 [SPARK-21738] Thriftserver doesn't cancel jobs when session is closed ## What changes were proposed in this pull request? When a session is closed the Thriftserver doesn't cancel the jobs which may still be

spark git commit: [SPARK-18464][SQL][BACKPORT] support old table which doesn't store schema in table properties

2017-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 f5ede0d55 -> 2a9697593 [SPARK-18464][SQL][BACKPORT] support old table which doesn't store schema in table properties backport https://github.com/apache/spark/pull/18907 to branch 2.2 Author: Wenchen Fan

spark git commit: [SPARK-21603][SQL] The wholestage codegen will be much slower then that is closed when the function is too long

2017-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master adf005dab -> 1cce1a3b6 [SPARK-21603][SQL] The wholestage codegen will be much slower then that is closed when the function is too long ## What changes were proposed in this pull request? Close the whole stage codegen when the function

spark git commit: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 07549b20a -> 8c54f1eb7 [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0 ## What changes were proposed in this pull request? Like Parquet, this PR aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for

spark git commit: [MINOR] Fix a typo in the method name `UserDefinedFunction.asNonNullabe`

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3f958a999 -> 42b9eda80 [MINOR] Fix a typo in the method name `UserDefinedFunction.asNonNullabe` ## What changes were proposed in this pull request? The method name `asNonNullabe` should be `asNonNullable`. ## How was this patch tested?

spark git commit: [SPARK-18464][SQL][FOLLOWUP] support old table which doesn't store schema in table properties

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bc9902587 -> 14bdb25fd [SPARK-18464][SQL][FOLLOWUP] support old table which doesn't store schema in table properties ## What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/15900 ,

spark git commit: [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 12411b5ed -> bc9902587 [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it ## What changes were proposed in this pull request? This is a follow-up PR that moves the test case in

spark git commit: [SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache when paths are successfully removed

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 9b749b6ce -> 6f366fbbf [SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache when paths are successfully removed ## What changes were proposed in this pull request? Backport SPARK-21721 to branch 2.1: We put staging

spark git commit: [SPARK-21732][SQL] Lazily init hive metastore client

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0422ce06d -> 12411b5ed [SPARK-21732][SQL] Lazily init hive metastore client ## What changes were proposed in this pull request? This PR changes the codes to lazily init hive metastore client so that we can create SparkSession without

spark git commit: [SPARK-21724][SQL][DOC] Adds since information in the documentation of date functions

2017-08-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4c3cf1cc5 -> 0422ce06d [SPARK-21724][SQL][DOC] Adds since information in the documentation of date functions ## What changes were proposed in this pull request? This PR adds `since` annotation in documentation so that this can be

spark git commit: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed

2017-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 48bacd36c -> d9c8e6223 [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed ## What changes were proposed in this pull request? We put staging path to delete into the deleteOnExit cache of

spark git commit: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed

2017-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 282f00b41 -> 4c3cf1cc5 [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed ## What changes were proposed in this pull request? We put staging path to delete into the deleteOnExit cache of

spark git commit: [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it

2017-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0326b69c9 -> fbc269252 [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it ## What changes were proposed in this pull request? Recently, we have also encountered such NPE issues in

spark git commit: [MINOR][SQL][TEST] no uncache table in joinsuite test

2017-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0fcde87aa -> 0326b69c9 [MINOR][SQL][TEST] no uncache table in joinsuite test ## What changes were proposed in this pull request? At present, in test("broadcasted hash outer join operator selection") case, set the testData2 to _CACHE

spark git commit: [MINOR][SQL] Additional test case for CheckCartesianProducts rule

2017-08-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c0e333dbe -> 5596ce83c [MINOR][SQL] Additional test case for CheckCartesianProducts rule ## What changes were proposed in this pull request? While discovering optimization rules and their test coverage, I did not find any tests for

spark git commit: [SPARK-19122][SQL] Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-08-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 94439997d -> 7f16c6910 [SPARK-19122][SQL] Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order ## What changes were proposed in this pull request? Jira :

spark git commit: [SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment

2017-08-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2387f1e31 -> 0377338bf [SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment Add an option to the JDBC data source to initialize the environment of the remote database session ## What changes

spark git commit: [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None

2017-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c06f3f5ac -> 84454d7d3 [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None ## What changes were proposed in this pull request? Currently `df.na.replace("*", Map[String, String]("NULL" -> null))` will produce

<    7   8   9   10   11   12   13   14   15   16   >