spark git commit: [SPARK-22496][SQL] thrift server adds operation logs

2017-12-10 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ab1b6ee73 -> 4289ac9d8 [SPARK-22496][SQL] thrift server adds operation logs ## What changes were proposed in this pull request? since hive 2.0+ upgrades log4j to log4j2,a lot of [changes](https://issues.apache.org/jira/browse/HIVE-11304

spark git commit: [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetastoreOrc by default

2017-12-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 18b75d465 -> aa1764ba1 [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetastoreOrc by default ## What changes were proposed in this pull request? Like Parquet, this PR aims to turn on `spark.sql.hive.convertMetastoreOrc` by default. ##

spark git commit: [SPARK-22719][SQL] Refactor ConstantPropagation

2017-12-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f41c0a93f -> 18b75d465 [SPARK-22719][SQL] Refactor ConstantPropagation ## What changes were proposed in this pull request? The current time complexity of ConstantPropagation is O(n^2), which can be slow when the query is complex. Refactor

spark git commit: [SPARK-22688][SQL] Upgrade Janino version to 3.0.8

2017-12-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f110a7f88 -> 8ae004b46 [SPARK-22688][SQL] Upgrade Janino version to 3.0.8 ## What changes were proposed in this pull request? This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html)

spark git commit: [SPARK-22693][SQL] CreateNamedStruct and InSet should not use global variables

2017-12-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9948b860a -> f110a7f88 [SPARK-22693][SQL] CreateNamedStruct and InSet should not use global variables ## What changes were proposed in this pull request? CreateNamedStruct and InSet are using a global variable which is not needed. This ca

spark git commit: [SPARK-22720][SS] Make EventTimeWatermark Extend UnaryNode

2017-12-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 51066b437 -> effca9868 [SPARK-22720][SS] Make EventTimeWatermark Extend UnaryNode ## What changes were proposed in this pull request? Our Analyzer and Optimizer have multiple rules for `UnaryNode`. After making `EventTimeWatermark` extend

spark git commit: [SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function

2017-12-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e98f9647f -> 4286cba7d [SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function ## What changes were proposed in this pull request? I was looking at the config code today and found that configs defined using ConfigBuilder.

spark git commit: [SPARK-20392][SQL] Set barrier to prevent re-entering a tree

2017-12-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 82183f7b5 -> 00d176d2f [SPARK-20392][SQL] Set barrier to prevent re-entering a tree ## What changes were proposed in this pull request? The SQL `Analyzer` goes through a whole query plan even most part of it is analyzed. This increases th

spark git commit: [SPARK-22662][SQL] Failed to prune columns after rewriting predicate subquery

2017-12-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 132a3f470 -> 1e17ab83d [SPARK-22662][SQL] Failed to prune columns after rewriting predicate subquery ## What changes were proposed in this pull request? As a simple example: ``` spark-sql> create table base (a int, b int) using parquet; Ti

spark git commit: [SPARK-22500][SQL][FOLLOWUP] cast for struct can split code even with whole stage codegen

2017-12-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ced6ccf0d -> 132a3f470 [SPARK-22500][SQL][FOLLOWUP] cast for struct can split code even with whole stage codegen ## What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/19730, we can split th

spark git commit: [SPARK-22701][SQL] add ctx.splitExpressionsWithCurrentInputs

2017-12-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 03fdc92e4 -> ced6ccf0d [SPARK-22701][SQL] add ctx.splitExpressionsWithCurrentInputs ## What changes were proposed in this pull request? This pattern appears many times in the codebase: ``` if (ctx.INPUT_ROW == null || ctx.currentVars != nu

spark git commit: [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions

2017-12-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1d5597b40 -> 3887b7eef [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions ## What changes were proposed in this pull request? Repartitioning by empty set of expressions is currently possible, even though it is a case w

spark git commit: [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case

2017-12-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e1dd03e42 -> 1d5597b40 [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case ## What changes were proposed in this pull request? This PR improves documentation for not using zero `numRows` statistics and simplifies the

spark git commit: [SPARK-22489][DOC][FOLLOWUP] Update broadcast behavior changes in migration section

2017-12-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dff440f1e -> 4131ad03f [SPARK-22489][DOC][FOLLOWUP] Update broadcast behavior changes in migration section ## What changes were proposed in this pull request? Update broadcast behavior changes in migration section. ## How was this patch

spark git commit: [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path

2017-11-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 af8a692d6 -> ba00bd961 [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path ## What changes were proposed in this pull request? When user tries to load data with a non existing hdfs

spark git commit: [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path

2017-11-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dc365422b -> 16adaf634 [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path ## What changes were proposed in this pull request? When user tries to load data with a non existing hdfs file

spark git commit: [SPARK-22614] Dataset API: repartitionByRange(...)

2017-11-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bcceab649 -> f5f8e84d9 [SPARK-22614] Dataset API: repartitionByRange(...) ## What changes were proposed in this pull request? This PR introduces a way to explicitly range-partition a Dataset. So far, only round-robin and hash partitioning

spark git commit: [SPARK-22489][SQL] Shouldn't change broadcast join buildSide if user clearly specified

2017-11-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6ac57fd0d -> bcceab649 [SPARK-22489][SQL] Shouldn't change broadcast join buildSide if user clearly specified ## What changes were proposed in this pull request? How to reproduce: ```scala import org.apache.spark.sql.execution.joins.Broad

spark git commit: [SPARK-21417][SQL] Infer join conditions using propagated constraints

2017-11-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 999ec137a -> 6ac57fd0d [SPARK-21417][SQL] Infer join conditions using propagated constraints ## What changes were proposed in this pull request? This PR adds an optimization rule that infers join conditions using propagated constraints.

spark git commit: [SPARK-22615][SQL] Handle more cases in PropagateEmptyRelation

2017-11-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 20b239845 -> 57687280d [SPARK-22615][SQL] Handle more cases in PropagateEmptyRelation ## What changes were proposed in this pull request? Currently, in the optimize rule `PropagateEmptyRelation`, the following cases is not handled: 1. em

spark git commit: [SPARK-22637][SQL] Only refresh a logical plan once.

2017-11-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 eef72d3f0 -> 38a0532cf [SPARK-22637][SQL] Only refresh a logical plan once. ## What changes were proposed in this pull request? `CatalogImpl.refreshTable` uses `foreach(..)` to refresh all tables in a view. This traverses all nodes in

spark git commit: [SPARK-22637][SQL] Only refresh a logical plan once.

2017-11-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a10b328db -> 475a29f11 [SPARK-22637][SQL] Only refresh a logical plan once. ## What changes were proposed in this pull request? `CatalogImpl.refreshTable` uses `foreach(..)` to refresh all tables in a view. This traverses all nodes in the

spark git commit: [SPARK-22515][SQL] Estimation relation size based on numRows * rowSize

2017-11-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b70e483cb -> da3557429 [SPARK-22515][SQL] Estimation relation size based on numRows * rowSize ## What changes were proposed in this pull request? Currently, relation size is computed as the sum of file size, which is error-prone because s

spark git commit: [SPARK-22602][SQL] remove ColumnVector#loadBytes

2017-11-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d49d9e403 -> 5a02e3a2a [SPARK-22602][SQL] remove ColumnVector#loadBytes ## What changes were proposed in this pull request? `ColumnVector#loadBytes` is only used as an optimization for reading UTF8String in `WritableColumnVector`, this PR

spark git commit: [SPARK-22604][SQL] remove the get address methods from ColumnVector

2017-11-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 70221903f -> e3fd93f14 [SPARK-22604][SQL] remove the get address methods from ColumnVector ## What changes were proposed in this pull request? `nullsNativeAddress` and `valuesNativeAddress` are only used in tests and benchmark, no need to

spark git commit: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport.consume

2017-11-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a1877f45c -> 70221903f [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport.consume ## What changes were proposed in this pull request? `ctx.currentVars` means the input variables for the current operator, which is already decided in

spark git commit: [SPARK-22592][SQL] cleanup filter converting for hive

2017-11-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 42f83d7c4 -> c1217565e [SPARK-22592][SQL] cleanup filter converting for hive ## What changes were proposed in this pull request? We have 2 different methods to convert filters for hive, regarding a config. This introduces duplicated and i

spark git commit: [SPARK-22543][SQL] fix java 64kb compile error for deeply nested expressions

2017-11-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 327d25fe1 -> 0605ad761 [SPARK-22543][SQL] fix java 64kb compile error for deeply nested expressions ## What changes were proposed in this pull request? A frequently reported issue of Spark is the Java 64kb compile error. This is because S

spark git commit: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR 19779 to branch-2.2 - Support writing to Hive table which uses Avro schema url 'avro.schema.url'

2017-11-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 df9228b49 -> b17f4063c [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR 19779 to branch-2.2 - Support writing to Hive table which uses Avro schema url 'avro.schema.url' ## What changes were proposed in this pull request? > Bac

spark git commit: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table which uses Avro schema url 'avro.schema.url'

2017-11-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 881c5c807 -> e0d7665ce [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table which uses Avro schema url 'avro.schema.url' ## What changes were proposed in this pull request? SPARK-19580 Support for avro.schema.url whil

spark git commit: [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source

2017-11-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 7bdad58e2 -> a02a8bd23 [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source ## What changes were proposed in this pull request? Let’s say I have a nested AND expression shown below and p2 can not be pus

spark git commit: [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source

2017-11-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 11a599bac -> df9228b49 [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source ## What changes were proposed in this pull request? Let’s say I have a nested AND expression shown below and p2 can not be pus

spark git commit: [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source

2017-11-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ac10171be -> 881c5c807 [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source ## What changes were proposed in this pull request? Let’s say I have a nested AND expression shown below and p2 can not be pushed

spark git commit: [SPARK-22542][SQL] remove unused features in ColumnarBatch

2017-11-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7f2e62ee6 -> b9dcbe5e1 [SPARK-22542][SQL] remove unused features in ColumnarBatch ## What changes were proposed in this pull request? `ColumnarBatch` provides features to do fast filter and project in a columnar fashion, however this feat

spark git commit: [SPARK-22479][SQL] Exclude credentials from SaveintoDataSourceCommand.simpleString

2017-11-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 39b3f10dd -> 2014e7a78 [SPARK-22479][SQL] Exclude credentials from SaveintoDataSourceCommand.simpleString ## What changes were proposed in this pull request? Do not include jdbc properties which may contain credentials in logging a logic

spark git commit: [SPARK-22469][SQL] Accuracy problem in comparison with string and numeric

2017-11-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master aa88b8dbb -> bc0848b4c [SPARK-22469][SQL] Accuracy problem in comparison with string and numeric ## What changes were proposed in this pull request? This fixes a problem caused by #15880 `select '1.5' > 0.5; // Result is NULL in Spark but i

spark git commit: [SPARK-22490][DOC] Add PySpark doc for SparkSession.builder

2017-11-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 210f2922b -> 3cefddee5 [SPARK-22490][DOC] Add PySpark doc for SparkSession.builder ## What changes were proposed in this pull request? In PySpark API Document, [SparkSession.build](http://spark.apache.org/docs/2.2.0/api/python/pyspark

spark git commit: [SPARK-22490][DOC] Add PySpark doc for SparkSession.builder

2017-11-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7f99a05e6 -> aa88b8dbb [SPARK-22490][DOC] Add PySpark doc for SparkSession.builder ## What changes were proposed in this pull request? In PySpark API Document, [SparkSession.build](http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql

spark git commit: [SPARK-22487][SQL][FOLLOWUP] still keep spark.sql.hive.version

2017-11-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 176ae4d53 -> f7534b37e [SPARK-22487][SQL][FOLLOWUP] still keep spark.sql.hive.version ## What changes were proposed in this pull request? a followup of https://github.com/apache/spark/pull/19712 , adds back the `spark.sql.hive.version`, s

spark git commit: [SPARK-22472][SQL] add null check for top-level primitive values

2017-11-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 1b70c66c9 -> 755152482 [SPARK-22472][SQL] add null check for top-level primitive values ## What changes were proposed in this pull request? One powerful feature of `Dataset` is, we can easily map SQL rows to Scala/Java objects and do

spark git commit: [SPARK-22472][SQL] add null check for top-level primitive values

2017-11-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b57ed2245 -> 0025ddeb1 [SPARK-22472][SQL] add null check for top-level primitive values ## What changes were proposed in this pull request? One powerful feature of `Dataset` is, we can easily map SQL rows to Scala/Java objects and do runt

spark git commit: [SPARK-22308][TEST-MAVEN] Support alternative unit testing styles in external applications

2017-11-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f5fe63f7b -> b57ed2245 [SPARK-22308][TEST-MAVEN] Support alternative unit testing styles in external applications Continuation of PR#19528 (https://github.com/apache/spark/pull/19529#issuecomment-340252119) The problem with the maven bui

spark git commit: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 73a2ca06b -> efaf73fcd [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests ## What changes were proposed in this pull request? The merge of SPARK-22211 to branch-2.2 dropped a couple of important lines that made sure the tests that c

spark git commit: [SPARK-21127][SQL][FOLLOWUP] fix a config name typo

2017-11-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 160a54061 -> d5202259d [SPARK-21127][SQL][FOLLOWUP] fix a config name typo ## What changes were proposed in this pull request? `spark.sql.statistics.autoUpdate.size` should be `spark.sql.statistics.size.autoUpdate.enabled`. The previous n

spark git commit: [SPARK-21625][DOC] Add incompatible Hive UDF describe to DOC

2017-11-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fe258a796 -> db389f719 [SPARK-21625][DOC] Add incompatible Hive UDF describe to DOC ## What changes were proposed in this pull request? Add incompatible Hive UDF describe to DOC. ## How was this patch tested? N/A Author: Yuming Wang C

spark git commit: [SPARK-22443][SQL] add implementation of quoteIdentifier, getTableExistsQuery and getSchemaQuery in AggregatedDialect

2017-11-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3bba8621c -> 572284c5b [SPARK-22443][SQL] add implementation of quoteIdentifier, getTableExistsQuery and getSchemaQuery in AggregatedDialect … ## What changes were proposed in this pull request? override JDBCDialects methods quoteIdent

spark git commit: [SPARK-22378][SQL] Eliminate redundant null check in generated code for extracting an element from complex types

2017-11-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6c6626614 -> 3bba8621c [SPARK-22378][SQL] Eliminate redundant null check in generated code for extracting an element from complex types ## What changes were proposed in this pull request? This PR eliminates redundant null check in generat

spark git commit: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 4074ed2e1 -> 5e3837380 [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown It's not safe in all cases to push down a LIMIT below a FULL OUTER JOIN. If the limit is pushed to one side of the FOJ, the physical join operator can not tel

spark git commit: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f7f4e9c2d -> 6c6626614 [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown ## What changes were proposed in this pull request? It's not safe in all cases to push down a LIMIT below a FULL OUTER JOIN. If the limit is pushed to one side o

spark git commit: [SPARK-22412][SQL] Fix incorrect comment in DataSourceScanExec

2017-11-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0c2aee69b -> f7f4e9c2d [SPARK-22412][SQL] Fix incorrect comment in DataSourceScanExec ## What changes were proposed in this pull request? Next fit decreasing bin packing algorithm is used to combine splits in DataSourceScanExec but the co

spark git commit: [SPARK-22254][CORE] Fix the arrayMax in BufferHolder

2017-11-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 891588660 -> bc1e10103 [SPARK-22254][CORE] Fix the arrayMax in BufferHolder ## What changes were proposed in this pull request? This PR replaces the old the maximum array size (`Int.MaxValue`) with the new one (`ByteArrayMethods.MAX_ROUND

spark git commit: [SPARK-22418][SQL][TEST] Add test cases for NULL Handling

2017-11-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 51145f137 -> 891588660 [SPARK-22418][SQL][TEST] Add test cases for NULL Handling ## What changes were proposed in this pull request? Added a test class to check NULL handling behavior. The expected behavior is defined as the one of the mos

spark git commit: [SPARK-22333][SQL][BACKPORT-2.2] timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference

2017-10-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 dd69ac620 -> ab87a92a1 [SPARK-22333][SQL][BACKPORT-2.2] timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference ## What changes were proposed in this pull request? This is a backport pr of https://github.c

spark git commit: [SPARK-22400][SQL] rename some APIs and classes to make their meaning clearer

2017-10-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 65338de5f -> 44c400315 [SPARK-22400][SQL] rename some APIs and classes to make their meaning clearer ## What changes were proposed in this pull request? Both `ReadSupport` and `ReadTask` have a method called `createReader`, but they creat

spark git commit: [SPARK-22396][SQL] Better Error Message for InsertIntoDir using Hive format without enabling Hive Support

2017-10-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 079a2609d -> 65338de5f [SPARK-22396][SQL] Better Error Message for InsertIntoDir using Hive format without enabling Hive Support ## What changes were proposed in this pull request? When Hive support is not on, users can hit unresolved plan

spark git commit: Revert "[SPARK-22308] Support alternative unit testing styles in external applications"

2017-10-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bc7ca9786 -> 659acf18d Revert "[SPARK-22308] Support alternative unit testing styles in external applications" This reverts commit 592cfeab9caeff955d115a1ca5014ede7d402907. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commi

spark git commit: [SPARK-19727][SQL][FOLLOWUP] Fix for round function that modifies original column

2017-10-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 cb54f297a -> cac6506ca [SPARK-19727][SQL][FOLLOWUP] Fix for round function that modifies original column ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/17075 , to fix the

spark git commit: [SPARK-19727][SQL][FOLLOWUP] Fix for round function that modifies original column

2017-10-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e80da8129 -> 7fdacbc77 [SPARK-19727][SQL][FOLLOWUP] Fix for round function that modifies original column ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/17075 , to fix the bu

spark git commit: [MINOR] Remove false comment from planStreamingAggregation

2017-10-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4c5269f1a -> e80da8129 [MINOR] Remove false comment from planStreamingAggregation ## What changes were proposed in this pull request? AggUtils.planStreamingAggregation has some comments about DISTINCT aggregates, while streaming aggregatio

spark git commit: [SPARK-21619][SQL] Fail the execution of canonicalized plans explicitly

2017-10-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c42d208e1 -> d28d5732a [SPARK-21619][SQL] Fail the execution of canonicalized plans explicitly ## What changes were proposed in this pull request? Canonicalized plans are not supposed to be executed. I ran into a case in which there's some

spark git commit: [SPARK-22333][SQL] timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference

2017-10-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 01f6ba0e7 -> c42d208e1 [SPARK-22333][SQL] timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-22333 In curren

spark git commit: [SPARK-22181][SQL] Adds ReplaceExceptWithFilter rule

2017-10-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 20eb95e5e -> 01f6ba0e7 [SPARK-22181][SQL] Adds ReplaceExceptWithFilter rule ## What changes were proposed in this pull request? Adds a new optimisation rule 'ReplaceExceptWithNotFilter' that replaces Except logical with Filter operator an

spark git commit: [SPARK-22226][SQL] splitExpression can create too many method calls in the outer class

2017-10-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 36b826f5d -> b3d8fc3dc [SPARK-6][SQL] splitExpression can create too many method calls in the outer class ## What changes were proposed in this pull request? SPARK-18016 introduced `NestedClass` to avoid that the many methods generate

spark git commit: [TRIVIAL][SQL] Code cleaning in ResolveReferences

2017-10-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 17af727e3 -> 36b826f5d [TRIVIAL][SQL] Code cleaning in ResolveReferences ## What changes were proposed in this pull request? This PR is to clean the related codes majorly based on the today's code review on https://github.com/apache/spark

spark git commit: [SPARK-21375][PYSPARK][SQL] Add Date and Timestamp support to ArrowConverters for toPandas() Conversion

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5c3a1f3fa -> 17af727e3 [SPARK-21375][PYSPARK][SQL] Add Date and Timestamp support to ArrowConverters for toPandas() Conversion ## What changes were proposed in this pull request? Adding date and timestamp support with Arrow for `toPandas(

[spark] Git Push Summary

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/test2.2 [deleted] cb54f297a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 2839280ad -> cb54f297a [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema This is a regression introduced by #14207. After Spark 2.1, we store the inferred schema when creating the

spark git commit: [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/test2.2 [created] cb54f297a [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema This is a regression introduced by #14207. After Spark 2.1, we store the inferred schema when creating the table

spark git commit: [SPARK-22355][SQL] Dataset.collect is not threadsafe

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 a607ddc52 -> 2839280ad [SPARK-22355][SQL] Dataset.collect is not threadsafe It's possible that users create a `Dataset`, and call `collect` of this `Dataset` in many threads at the same time. Currently `Dataset#collect` just call `enc

spark git commit: [SPARK-22355][SQL] Dataset.collect is not threadsafe

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9b262f6a0 -> 5c3a1f3fa [SPARK-22355][SQL] Dataset.collect is not threadsafe ## What changes were proposed in this pull request? It's possible that users create a `Dataset`, and call `collect` of this `Dataset` in many threads at the same

spark git commit: [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8e9863531 -> 9b262f6a0 [SPARK-22356][SQL] data source table should support overlapped columns between data and partition schema ## What changes were proposed in this pull request? This is a regression introduced by #14207. After Spark 2.1

spark git commit: [SPARK-22308] Support alternative unit testing styles in external applications

2017-10-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5433be44c -> 592cfeab9 [SPARK-22308] Support alternative unit testing styles in external applications ## What changes were proposed in this pull request? Support unit tests of external code (i.e., applications that use spark) using scalate

spark git commit: [SPARK-13947][SQL] The error message from using an invalid column reference is not clear

2017-10-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 524abb996 -> 427359f07 [SPARK-13947][SQL] The error message from using an invalid column reference is not clear ## What changes were proposed in this pull request? Rewritten error message for clarity. Added extra information in case of

spark git commit: [SPARK-21101][SQL] Catch IllegalStateException when CREATE TEMPORARY FUNCTION

2017-10-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bc1e76632 -> 524abb996 [SPARK-21101][SQL] Catch IllegalStateException when CREATE TEMPORARY FUNCTION ## What changes were proposed in this pull request? It must `override` [`public StructObjectInspector initialize(ObjectInspector[] argOIs

spark git commit: [SPARK-22301][SQL] Add rule to Optimizer for In with not-nullable value and empty list

2017-10-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8beeaed66 -> 3f5ba968c [SPARK-22301][SQL] Add rule to Optimizer for In with not-nullable value and empty list ## What changes were proposed in this pull request? For performance reason, we should resolve in operation on an empty list as

spark git commit: [SPARK-21912][SQL][FOLLOW-UP] ORC/Parquet table should not create invalid column names

2017-10-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f6290aea2 -> 884d4f95f [SPARK-21912][SQL][FOLLOW-UP] ORC/Parquet table should not create invalid column names ## What changes were proposed in this pull request? During [SPARK-21912](https://issues.apache.org/jira/browse/SPARK-21912), we

spark git commit: [SPARK-22303][SQL] Handle Oracle specific jdbc types in OracleDialect

2017-10-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 57accf6e3 -> 5a5b6b785 [SPARK-22303][SQL] Handle Oracle specific jdbc types in OracleDialect TIMESTAMP (-101), BINARY_DOUBLE (101) and BINARY_FLOAT (100) are handled in OracleDialect ## What changes were proposed in this pull request? Wh

spark git commit: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source

2017-10-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ff8de99a1 -> ca2a780e7 [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source ## What changes were proposed in this pull request? When [SPARK-19261](https://issues.apache.org/jira/browse/SPARK-19261) impl

spark git commit: [SPARK-21055][SQL][FOLLOW-UP] replace grouping__id with grouping_id()

2017-10-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d8cada8d1 -> a763607e4 [SPARK-21055][SQL][FOLLOW-UP] replace grouping__id with grouping_id() ## What changes were proposed in this pull request? Simplifies the test cases that were added in the PR https://github.com/apache/spark/pull/18270

spark git commit: [SPARK-20331][SQL][FOLLOW-UP] Add a SQLConf for enhanced Hive partition pruning predicate pushdown

2017-10-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d9f286d26 -> d8cada8d1 [SPARK-20331][SQL][FOLLOW-UP] Add a SQLConf for enhanced Hive partition pruning predicate pushdown ## What changes were proposed in this pull request? This is a follow-up PR of https://github.com/apache/spark/pull/17

spark git commit: [SPARK-22326][SQL] Remove unnecessary hashCode and equals methods

2017-10-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b8624b06e -> d9f286d26 [SPARK-22326][SQL] Remove unnecessary hashCode and equals methods ## What changes were proposed in this pull request? Plan equality should be computed by `canonicalized`, so we can remove unnecessary `hashCode` and

spark git commit: [SPARK-20396][SQL][PYSPARK][FOLLOW-UP] groupby().apply() with pandas udf

2017-10-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 568763baf -> b8624b06e [SPARK-20396][SQL][PYSPARK][FOLLOW-UP] groupby().apply() with pandas udf ## What changes were proposed in this pull request? This is a follow-up of #18732. This pr modifies `GroupedData.apply()` method to convert pan

spark git commit: [SPARK-21055][SQL] replace grouping__id with grouping_id()

2017-10-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e2fea8cd6 -> 16c9cc68c [SPARK-21055][SQL] replace grouping__id with grouping_id() ## What changes were proposed in this pull request? spark does not support grouping__id, it has grouping_id() instead. But it is not convenient for hive user

spark git commit: [SQL] Mark strategies with override for clarity.

2017-10-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b034f2565 -> b84f61cd7 [SQL] Mark strategies with override for clarity. ## What changes were proposed in this pull request? This is a very trivial PR, simply marking `strategies` in `SparkPlanner` with the `override` keyword for clarity s

spark git commit: [SPARK-22026][SQL] data source v2 write path

2017-10-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7fae7995b -> b034f2565 [SPARK-22026][SQL] data source v2 write path ## What changes were proposed in this pull request? A working prototype for data source v2 write path. The writing framework is similar to the reading framework. i.e. `Wr

spark git commit: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer

2017-10-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 c42309143 -> 010b50cea [SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer ## What changes were proposed in this pull request? This PR addresses the comments by gatorsmile on [the previous PR](https:/

spark git commit: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer

2017-10-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 72561ecf4 -> 1f25d8683 [SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer ## What changes were proposed in this pull request? This PR addresses the comments by gatorsmile on [the previous PR](https://git

spark git commit: [SPARK-22271][SQL] mean overflows and returns null for some decimal variables

2017-10-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 71d1cb6a4 -> c42309143 [SPARK-22271][SQL] mean overflows and returns null for some decimal variables ## What changes were proposed in this pull request? In Average.scala, it has ``` override lazy val evaluateExpression = child.dataTy

spark git commit: [SPARK-22271][SQL] mean overflows and returns null for some decimal variables

2017-10-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 75d666b95 -> 28f9f3f22 [SPARK-22271][SQL] mean overflows and returns null for some decimal variables ## What changes were proposed in this pull request? In Average.scala, it has ``` override lazy val evaluateExpression = child.dataType m

spark git commit: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly

2017-10-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 561505e2f -> c09a2a76b [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly ## What changes were proposed in this pull request? This PR aims to improve **StatisticsSuite** to test `convertMetastore` config

spark git commit: [SPARK-22282][SQL] Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION

2017-10-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0fa10666c -> 561505e2f [SPARK-22282][SQL] Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION ## What changes were proposed in this pull request? This PR aims to - Rename `OrcRelation` to `OrcFileFormat` object. - Replace `OrcR

spark git commit: [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.

2017-10-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 920372a19 -> eb00037a7 [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators. ## What changes were proposed in this pull request? When fixing schema field names using escape characters with `addReferenceMinorObj()`

spark git commit: [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.

2017-10-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 30d5c9fd8 -> acbad83ec [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators. ## What changes were proposed in this pull request? When fixing schema field names using escape characters with `addReferenceMinorObj()`

spark git commit: [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.

2017-10-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e3536406e -> e0503a722 [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators. ## What changes were proposed in this pull request? When fixing schema field names using escape characters with `addReferenceMinorObj()` at

spark git commit: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-10-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 06df34d35 -> e3536406e [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible ## What changes were proposed in this pull request? `BasicWriteTaskStatsTracker.getFileSize()`

spark git commit: [SPARK-22252][SQL][FOLLOWUP] Command should not be a LeafNode

2017-10-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6412ea175 -> 3823dc88d [SPARK-22252][SQL][FOLLOWUP] Command should not be a LeafNode ## What changes were proposed in this pull request? This is a minor folllowup of #19474 . #19474 partially reverted #18064 but accidentally introduced a

spark git commit: [SPARK-22257][SQL] Reserve all non-deterministic expressions in ExpressionSet

2017-10-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ec122209f -> 2f00a71a8 [SPARK-22257][SQL] Reserve all non-deterministic expressions in ExpressionSet ## What changes were proposed in this pull request? For non-deterministic expressions, they should be considered as not contained in the

spark git commit: [SPARK-22252][SQL][2.2] FileFormatWriter should respect the input query schema

2017-10-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 cfc04e062 -> c9187db80 [SPARK-22252][SQL][2.2] FileFormatWriter should respect the input query schema ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/18386 fixes SPARK-21165 but breaks SPARK-22

spark git commit: [SPARK-22263][SQL] Refactor deterministic as lazy value

2017-10-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9104add4c -> 3ff766f61 [SPARK-22263][SQL] Refactor deterministic as lazy value ## What changes were proposed in this pull request? The method `deterministic` is frequently called in optimizer. Refactor `deterministic` as lazy value, in ord

spark git commit: [SPARK-20055][DOCS] Added documentation for loading csv files into DataFrames

2017-10-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 645e108ee -> ccdf21f56 [SPARK-20055][DOCS] Added documentation for loading csv files into DataFrames ## What changes were proposed in this pull request? Added documentation for loading csv files into Dataframes ## How was this patch test

<    6   7   8   9   10   11   12   13   14   15   >