spark git commit: [SPARK-13883][SQL] Parquet Implementation of FileFormat.buildReader

2016-03-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 729996165 -> 8014a516d [SPARK-13883][SQL] Parquet Implementation of FileFormat.buildReader This PR add implements the new `buildReader` interface for the Parquet `FileFormat`. An simple implementation of `FileScanRDD` is also included.

spark git commit: [SPARK-14016][SQL] Support high-precision decimals in vectorized parquet reader

2016-03-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 43ef1e52b -> 729996165 [SPARK-14016][SQL] Support high-precision decimals in vectorized parquet reader ## What changes were proposed in this pull request? This patch adds support for reading `DecimalTypes` with high (> 18) precision in

spark git commit: Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example"

2016-03-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 3f49e0766 -> 43ef1e52b Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example" This reverts commit 1af8de200c4d3357bcb09e7bbc6deece00e885f2. Project:

spark git commit: [SPARK-13320][SQL] Support Star in CreateStruct/CreateArray and Error Handling when DataFrame/DataSet Functions using Star

2016-03-21 Thread wenchen
Repository: spark Updated Branches: refs/heads/master b5f1ab701 -> 3f49e0766 [SPARK-13320][SQL] Support Star in CreateStruct/CreateArray and Error Handling when DataFrame/DataSet Functions using Star This PR resolves two issues: First, expanding * inside aggregate functions of structs when

spark git commit: [SPARK-13990] Automatically pick serializer when caching RDDs

2016-03-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master b3e5af62a -> b5f1ab701 [SPARK-13990] Automatically pick serializer when caching RDDs Building on the `SerializerManager` introduced in SPARK-13926/ #11755, this patch Spark modifies Spark's BlockManager to use RDD's ClassTags in order to

spark git commit: [SPARK-13898][SQL] Merge DatasetHolder and DataFrameHolder

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5e86e9262 -> b3e5af62a [SPARK-13898][SQL] Merge DatasetHolder and DataFrameHolder ## What changes were proposed in this pull request? This patch merges DatasetHolder and DataFrameHolder. This makes more sense because DataFrame/Dataset are

spark git commit: [SPARK-13916][SQL] Add a metric to WholeStageCodegen to measure duration.

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1af8de200 -> 5e86e9262 [SPARK-13916][SQL] Add a metric to WholeStageCodegen to measure duration. ## What changes were proposed in this pull request? WholeStageCodegen naturally breaks the execution into pipelines that are easier to

spark git commit: [SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example

2016-03-21 Thread meng
Repository: spark Updated Branches: refs/heads/master f3717fc7c -> 1af8de200 [SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is

spark git commit: [SPARK-14004][FOLLOW-UP] Implementations of NonSQLExpression should not override sql method

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master f35df7d18 -> f3717fc7c [SPARK-14004][FOLLOW-UP] Implementations of NonSQLExpression should not override sql method ## What changes were proposed in this pull request? There is only one exception: `PythonUDF`. However, I don't think the

spark git commit: [SPARK-13805] [SQL] Generate code that get a value in each column from ColumnVector when ColumnarBatch is used

2016-03-21 Thread davies
Repository: spark Updated Branches: refs/heads/master 9b4e15ba1 -> f35df7d18 [SPARK-13805] [SQL] Generate code that get a value in each column from ColumnVector when ColumnarBatch is used ## What changes were proposed in this pull request? This PR generates code that get a value in each

spark git commit: [SPARK-13456][SQL] fix creating encoders for case classes defined in Spark shell

2016-03-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 060a28c63 -> 43ebf7a9c [SPARK-13456][SQL] fix creating encoders for case classes defined in Spark shell ## What changes were proposed in this pull request? case classes defined in REPL are wrapped by line classes, and we have a trick for

spark git commit: [SPARK-13826][SQL] Ad-hoc Dataset API ScalaDoc fixes

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master a2a907802 -> 060a28c63 [SPARK-13826][SQL] Ad-hoc Dataset API ScalaDoc fixes ## What changes were proposed in this pull request? Ad-hoc Dataset API ScalaDoc fixes ## How was this patch tested? By building and checking ScalaDoc locally.

spark git commit: [SPARK-14039][SQL][MINOR] make SubqueryHolder an inner class

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master df61fbd97 -> a2a907802 [SPARK-14039][SQL][MINOR] make SubqueryHolder an inner class ## What changes were proposed in this pull request? `SubqueryHolder` is only used when generate SQL string in `SQLBuilder`, it's more clear to make it an

spark git commit: [SPARK-13986][CORE][MLLIB] Remove `DeveloperApi`-annotations for non-publics

2016-03-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 17a3f0067 -> df61fbd97 [SPARK-13986][CORE][MLLIB] Remove `DeveloperApi`-annotations for non-publics ## What changes were proposed in this pull request? Spark uses `DeveloperApi` annotation, but sometimes it seems to conflict with

spark git commit: [SPARK-14000][SQL] case class with a tuple field can't work in Dataset

2016-03-21 Thread lian
Repository: spark Updated Branches: refs/heads/master 2c5b18fb0 -> 17a3f0067 [SPARK-14000][SQL] case class with a tuple field can't work in Dataset ## What changes were proposed in this pull request? When we validate an encoder, we may call `dataType` on unresolved expressions. This PR fix

spark git commit: [SPARK-12789][SQL] Support Order By Ordinal in SQL

2016-03-21 Thread wenchen
Repository: spark Updated Branches: refs/heads/master c35c60fa9 -> 2c5b18fb0 [SPARK-12789][SQL] Support Order By Ordinal in SQL What changes were proposed in this pull request? This PR is to support order by position in SQL, e.g. ```SQL select c1, c2, c3 from tbl order by 1 desc, 3 ```

spark git commit: [SPARK-14028][STREAMING][KINESIS][TESTS] Remove deprecated methods; fix two other warnings

2016-03-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 761c2d1b6 -> c35c60fa9 [SPARK-14028][STREAMING][KINESIS][TESTS] Remove deprecated methods; fix two other warnings ## What changes were proposed in this pull request? - Removed two methods that has been deprecated since 1.4 - Fixed two

spark git commit: [MINOR][DOCS] Add proper periods and spaces for CLI help messages and `config` doc.

2016-03-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 20fd25410 -> 761c2d1b6 [MINOR][DOCS] Add proper periods and spaces for CLI help messages and `config` doc. ## What changes were proposed in this pull request? This PR adds some proper periods and spaces to Spark CLI help messages and

[1/3] spark git commit: [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule

2016-03-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master e47408814 -> 20fd25410 http://git-wip-us.apache.org/repos/asf/spark/blob/20fd2541/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --

spark git commit: [SPARK-13764][SQL] Parse modes in JSON data source

2016-03-21 Thread wenchen
Repository: spark Updated Branches: refs/heads/master f58319a24 -> e47408814 [SPARK-13764][SQL] Parse modes in JSON data source ## What changes were proposed in this pull request? Currently, there is no way to control the behaviour when fails to parse corrupt records in JSON data source .