spark git commit: [SPARK-22781][SS] Support creating streaming dataset with ORC files

2017-12-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 13268a58f -> 9962390af [SPARK-22781][SS] Support creating streaming dataset with ORC files ## What changes were proposed in this pull request? Like `Parquet`, users can use `ORC` with Apache Spark structured streaming. This PR adds

spark git commit: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dataset API

2017-12-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6e36d8d56 -> 13268a58f [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dataset API ## What changes were proposed in this pull request? This change adds local checkpoint support to datasets and respective bind from Python Dataframe

spark git commit: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3a7494dfe -> 6e36d8d56 [SPARK-22829] Add new built-in function date_trunc() ## What changes were proposed in this pull request? Adding date_trunc() as a built-in function. `date_trunc` is common in other databases, but Spark or Hive does

spark git commit: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in case of exception in spill

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 6129ffa11 -> 3a7494dfe [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in case of exception in spill ## What changes were proposed in this pull request? Currently, the task memory manager throws an OutofMemory error when there is an

svn commit: r23809 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_19_12_01-6129ffa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-19 Thread pwendell
Author: pwendell Date: Tue Dec 19 20:14:52 2017 New Revision: 23809 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_19_12_01-6129ffa docs [This commit notification would consist of 1414 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.]

[2/2] spark git commit: [SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division

2017-12-19 Thread lixiao
[SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division ## What changes were proposed in this pull request? Test Coverage for `WidenSetOperationTypes`, `BooleanEquality`, `StackCoercion` and `Division`, this is a Sub-tasks for

[1/2] spark git commit: [SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division

2017-12-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ef10f452e -> 6129ffa11 http://git-wip-us.apache.org/repos/asf/spark/blob/6129ffa1/sql/core/src/test/resources/sql-tests/results/typeCoercion/native/division.sql.out -- diff

spark git commit: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused by InferFiltersFromConstraints

2017-12-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ee56fc343 -> ef10f452e [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused by InferFiltersFromConstraints ## What changes were proposed in this pull request? The optimizer rule `InferFiltersFromConstraints` could trigger our batch

svn commit: r23802 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_19_08_01-b779c93-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-19 Thread pwendell
Author: pwendell Date: Tue Dec 19 16:14:42 2017 New Revision: 23802 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_19_08_01-b779c93 docs [This commit notification would consist of 1414 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.]

[2/2] spark git commit: [SPARK-18016][SQL] Code Generation: Constant Pool Limit - reduce entries for mutable state

2017-12-19 Thread wenchen
[SPARK-18016][SQL] Code Generation: Constant Pool Limit - reduce entries for mutable state ## What changes were proposed in this pull request? This PR is follow-on of #19518. This PR tries to reduce the number of constant pool entries used for accessing mutable state. There are two directions:

[1/2] spark git commit: [SPARK-18016][SQL] Code Generation: Constant Pool Limit - reduce entries for mutable state

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master b779c9351 -> ee56fc343 http://git-wip-us.apache.org/repos/asf/spark/blob/ee56fc34/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --

spark git commit: [SPARK-22815][SQL] Keep PromotePrecision in Optimized Plans

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 28315714d -> b779c9351 [SPARK-22815][SQL] Keep PromotePrecision in Optimized Plans ## What changes were proposed in this pull request? We could get incorrect results by running DecimalPrecision twice. This PR resolves the original found

spark git commit: [SPARK-22791][SQL][SS] Redact Output of Explain

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 571aa2755 -> 28315714d [SPARK-22791][SQL][SS] Redact Output of Explain ## What changes were proposed in this pull request? When calling explain on a query, the output can contain sensitive information. We should provide an admin/user to

spark git commit: [SPARK-21984][SQL] Join estimation based on equi-height histogram

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master ab7346f20 -> 571aa2755 [SPARK-21984][SQL] Join estimation based on equi-height histogram ## What changes were proposed in this pull request? Equi-height histogram is one of the state-of-the-art statistics for cardinality estimation,

spark git commit: [SPARK-22673][SQL] InMemoryRelation should utilize existing stats whenever possible

2017-12-19 Thread wenchen
Repository: spark Updated Branches: refs/heads/master d4e69595d -> ab7346f20 [SPARK-22673][SQL] InMemoryRelation should utilize existing stats whenever possible ## What changes were proposed in this pull request? The current implementation of InMemoryRelation always uses the most expensive