spark git commit: [SPARK-12610][SQL] Left Anti Join

2016-04-06 Thread rxin
ite` and ported `ExistenceJoinSuite` from https://github.com/apache/spark/pull/10563. cc davies chenghao-intel rxin Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #12214 from hvanhovell/SPARK-12610. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-

spark git commit: [SPARK-14359] Unit tests for java 8 lambda syntax with typed aggregates

2016-04-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1146c534d -> 7d29c72f6 [SPARK-14359] Unit tests for java 8 lambda syntax with typed aggregates ## What changes were proposed in this pull request? Adds unit tests for java 8 lambda syntax with typed aggregates as a follow-up to #12168

spark git commit: [HOTFIX] Fix `optional` to `createOptional`.

2016-04-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master d5ee9d5c2 -> 48682f6bf [HOTFIX] Fix `optional` to `createOptional`. ## What changes were proposed in this pull request? This PR fixes the following line. ``` private[spark] val STAGING_DIR = ConfigBuilder("spark.yarn.stagingDir")

spark git commit: [SPARK-14359] Create built-in functions for typed aggregates in Java

2016-04-04 Thread rxin
ady exposed in Scala. ## How was this patch tested? Unit tests. rxin Author: Eric Liang <e...@databricks.com> Closes #12168 from ericl/sc-2794. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06462301 Tree: http:

spark git commit: [SPARK-14356] Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3f749f7ed -> 76f3c735a [SPARK-14356] Update spark.sql.execution.debug to work on Datasets ## What changes were proposed in this pull request? Update DebugQuery to work on Datasets of any type, not just DataFrames. ## How was this patch

spark git commit: [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9023015f0 -> 3f749f7ed [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results ## What changes were proposed in this pull request? This PR contains the following 5 types of maintenance fix over 59 files

spark git commit: [HOTFIX] Fix Scala 2.10 compilation

2016-04-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master c2f25b1a1 -> 7be462050 [HOTFIX] Fix Scala 2.10 compilation Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7be46205 Tree:

spark git commit: [SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon

2016-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4a6e78abd -> 03d130f97 [SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon ## What changes were proposed in this pull request? Straggler references to Tachyon were removed: - for docs, `tachyon` has been generalized as

[2/2] spark git commit: [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.

2016-04-02 Thread rxin
[MINOR][DOCS] Use multi-line JavaDoc comments in Scala code. ## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How

[1/2] spark git commit: [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.

2016-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master f70503761 -> 4a6e78abd http://git-wip-us.apache.org/repos/asf/spark/blob/4a6e78ab/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala -- diff

spark git commit: [SPARK-14338][SQL] Improve `SimplifyConditionals` rule to handle `null` in IF/CASEWHEN

2016-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master a3e293542 -> f70503761 [SPARK-14338][SQL] Improve `SimplifyConditionals` rule to handle `null` in IF/CASEWHEN ## What changes were proposed in this pull request? Currently, `SimplifyConditionals` handles `true` and `false` to optimize

spark git commit: [HOTFIX] Disable StateStoreSuite.maintenance

2016-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 06694f1c6 -> a3e293542 [HOTFIX] Disable StateStoreSuite.maintenance Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3e29354 Tree:

spark git commit: [HOTFIX] Fix compilation break.

2016-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master d7982a3a9 -> 67d753516 [HOTFIX] Fix compilation break. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/67d75351 Tree:

spark git commit: [MINOR][SQL] Fix comments styl and correct several styles and nits in CSV data source

2016-04-01 Thread rxin
Repository: spark Updated Branches: refs/heads/master f41415441 -> d7982a3a9 [MINOR][SQL] Fix comments styl and correct several styles and nits in CSV data source ## What changes were proposed in this pull request? While trying to create a PR (which was not an issue at the end), I just

spark git commit: [SPARK-14285][SQL] Implement common type-safe aggregate functions

2016-04-01 Thread rxin
ome as a separate pull request. One challenge there is to resolve the type difference between Scala primitive types and Java boxed types. ## How was this patch tested? Added unit tests for them. Author: Reynold Xin <r...@databricks.com> Closes #12077 from rxin/SPARK-14285. Project: http:

spark git commit: [SPARK-14251][SQL] Add SQL command for printing out generated code for debugging

2016-04-01 Thread rxin
Repository: spark Updated Branches: refs/heads/master 877dc712e -> fa1af0aff [SPARK-14251][SQL] Add SQL command for printing out generated code for debugging ## What changes were proposed in this pull request? This PR implements `EXPLAIN CODEGEN` SQL command which returns generated codes

[7/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g -- diff --git

[4/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/CatalystQl.scala -- diff --git

[5/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git

[1/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master 26445c2e4 -> a9b93e073 http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala -- diff --git

[3/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ng/AstBuilder.scala -- diff --git

[6/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

[8/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
rxin andrewor14 yhuai Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #12071 from hvanhovell/SPARK-14211. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9b93e07 Tree: http://git-wip-us.apache.org/repos/asf

[2/8] spark git commit: [SPARK-14211][SQL] Remove ANTLR3 based parser

2016-03-31 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/a9b93e07/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala -- diff --git

spark git commit: [SPARK-14081][SQL] - Preserve DataFrame column types when filling nulls.

2016-03-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 258a24341 -> da54abfd8 [SPARK-14081][SQL] - Preserve DataFrame column types when filling nulls. ## What changes were proposed in this pull request? This change resolves an issue where `DataFrameNaFunctions.fill` changes a `FloatType`

spark git commit: [SPARK-14282][SQL] CodeFormatter should handle oneline comment with /* */ properly

2016-03-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master dadf0138b -> 258a24341 [SPARK-14282][SQL] CodeFormatter should handle oneline comment with /* */ properly ## What changes were proposed in this pull request? This PR improves `CodeFormatter` to fix the following malformed indentations.

spark git commit: [SPARK-14227][SQL] Add method for printing out generated code for debugging

2016-03-29 Thread rxin
128 */ agg_mapIter.close(); /* 129 */ if (agg_sorter == null) { /* 130 */ agg_hashMap.free(); /* 131 */ } /* 132 */ } /* 133 */ } ``` rxin Author: Eric Liang <e...@databricks.com> Closes #12025 from ericl/spark-14227. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commi

spark git commit: [MINOR][SQL] Fix exception message to print string-array correctly.

2016-03-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master d612228ef -> 838cb4583 [MINOR][SQL] Fix exception message to print string-array correctly. ## What changes were proposed in this pull request? This PR is a simple fix for an exception message to print `string[]` content correctly.

spark git commit: [MINOR][SQL] Fix typos by replacing 'much' with 'match'.

2016-03-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master d26c42982 -> d612228ef [MINOR][SQL] Fix typos by replacing 'much' with 'match'. ## What changes were proposed in this pull request? This PR fixes two trivial typos: 'does not **much**' --> 'does not **match**'. ## How was this patch

spark git commit: [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinit…

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 504b99262 -> a7579444d [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinit… ## What changes were proposed in this pull request? Currently, `GraphOps.pickRandomVertex()` falls into infinite loops for graphs having

spark git commit: [SPARK-13981][SQL] Defer evaluating variables within Filter operator.

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 27d4ef0c6 -> 4a55c3363 [SPARK-13981][SQL] Defer evaluating variables within Filter operator. ## What changes were proposed in this pull request? This improves the Filter codegen for NULLs by deferring loading the values for IsNotNull.

spark git commit: [SPARK-14213][SQL] Migrate HiveQl parsing to ANTLR4 parser

2016-03-28 Thread rxin
tps://github.com/apache/spark/pull/12011, and we should wait with merging until that one is in (hence the WIP tag). As soon as this PR is merged we can start removing much of the old parser infrastructure. ### How was this patch tested? Exisiting Hive unit tests. cc rxin andrewor14 yhuai Author: Her

spark git commit: [SPARK-14205][SQL] remove trait Queryable

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 289257c4c -> 38326cad8 [SPARK-14205][SQL] remove trait Queryable ## What changes were proposed in this pull request? After DataFrame and Dataset are merged, the trait `Queryable` becomes unnecessary as it has only one implementation. We

spark git commit: [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2bc7c96d6 -> 289257c4c [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex ## What changes were proposed in this pull request? Currently, `GraphOps.pickRandomVertex()` falls into

spark git commit: [SPARK-14155][SQL] Hide UserDefinedType interface in Spark 2.0

2016-03-28 Thread rxin
API for user-defined type that also works well with column batches as well as encoders (datasets). In Spark 2.0, let's make `UserDefinedType` `private[spark]` first. ## How was this patch tested? Existing unit tests. Author: Reynold Xin <r...@databricks.com> Closes #11955 from rxin/SP

[1/3] spark git commit: [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1528ff4c9 -> 600c0b69c http://git-wip-us.apache.org/repos/asf/spark/blob/600c0b69/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/ExpressionParserSuite.scala

[3/3] spark git commit: [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4

2016-03-28 Thread rxin
[SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4 ### What changes were proposed in this pull request? The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4. This parser is based on the [Presto's SQL

[2/3] spark git commit: [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4

2016-03-28 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/600c0b69/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ng/AstBuilder.scala -- diff --git

spark git commit: [SPARK-14185][SQL][MINOR] Make indentation of debug log for generated code proper

2016-03-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8ef493760 -> aac13fb48 [SPARK-14185][SQL][MINOR] Make indentation of debug log for generated code proper ## What changes were proposed in this pull request? The indentation of debug log output by `CodeGenerator` is weird. The first line

spark git commit: [SPARK-14175][SQL] whole stage codegen interface refactor

2016-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master a91784fb6 -> bd94ea4c8 [SPARK-14175][SQL] whole stage codegen interface refactor ## What changes were proposed in this pull request? 1. merge consumeChild into consume() 2. always generate code for input variables and UnsafeRow, a plan

spark git commit: [SPARK-14135] Add off-heap storage memory bookkeeping support to MemoryManager

2016-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master bd94ea4c8 -> 20c0bcd97 [SPARK-14135] Add off-heap storage memory bookkeeping support to MemoryManager This patch extends Spark's `UnifiedMemoryManager` to add bookkeeping support for off-heap storage memory, an requirement for enabling

spark git commit: [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter

2016-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master 13945dd83 -> d23ad7c1c [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter ## What changes were proposed in this pull request? This PR removes all docs about the old streaming-akka,

[1/2] spark git commit: [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark

2016-03-25 Thread rxin
Repository: spark Updated Branches: refs/heads/master 54d13bed8 -> 24587ce43 http://git-wip-us.apache.org/repos/asf/spark/blob/24587ce4/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala --

[2/2] spark git commit: [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark

2016-03-25 Thread rxin
[SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark ## What changes were proposed in this pull request? This PR moves flume back to Spark as per the discussion in the dev mail-list. ## How was this patch tested? Existing Jenkins tests. Author: Shixiong Zhu

spark git commit: [SPARK-14149] Log exceptions in tryOrIOException

2016-03-25 Thread rxin
gle place. ## How was this patch tested? A logging change with a manual test. Author: Reynold Xin <r...@databricks.com> Closes #11951 from rxin/SPARK-14149. (cherry picked from commit 70a6f0bb57ca2248444157e2707fbcc3cb04e3bc) Signed-off-by: Reynold Xin <r...@databricks.com> Proj

spark git commit: [SPARK-14149] Log exceptions in tryOrIOException

2016-03-25 Thread rxin
gle place. ## How was this patch tested? A logging change with a manual test. Author: Reynold Xin <r...@databricks.com> Closes #11951 from rxin/SPARK-14149. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/70a6f0bb Tree: h

spark git commit: [SPARK-14149] Log exceptions in tryOrIOException

2016-03-25 Thread rxin
gle place. ## How was this patch tested? A logging change with a manual test. Author: Reynold Xin <r...@databricks.com> Closes #11951 from rxin/SPARK-14149. (cherry picked from commit 70a6f0bb57ca2248444157e2707fbcc3cb04e3bc) Signed-off-by: Reynold Xin <r...@databricks.com> Proj

spark git commit: [SPARK-14142][SQL] Replace internal use of unionAll with union

2016-03-24 Thread rxin
all existing tests. Author: Reynold Xin <r...@databricks.com> Closes #11946 from rxin/SPARK-14142. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3619fec1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3619

spark git commit: [SPARK-14110][CORE] PipedRDD to print the command ran on non zero exit

2016-03-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master c44d140ca -> 01849da08 [SPARK-14110][CORE] PipedRDD to print the command ran on non zero exit ## What changes were proposed in this pull request? In case of failure in subprocess launched in PipedRDD, the failure exception reads

[1/2] spark git commit: [SPARK-14014][SQL] Replace existing catalog with SessionCatalog

2016-03-23 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6bc4be64f -> 5dfc01976 http://git-wip-us.apache.org/repos/asf/spark/blob/5dfc0197/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala -- diff --git

[2/2] spark git commit: [SPARK-14014][SQL] Replace existing catalog with SessionCatalog

2016-03-23 Thread rxin
[SPARK-14014][SQL] Replace existing catalog with SessionCatalog ## What changes were proposed in this pull request? `SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This

spark git commit: [SPARK-14088][SQL] Some Dataset API touch-up

2016-03-23 Thread rxin
tch tested? All changes should be covered by existing tests. Also added couple test cases to cover "name". Author: Reynold Xin <r...@databricks.com> Closes #11908 from rxin/SPARK-14088. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.o

spark git commit: [SPARK-13401][SQL][TESTS] Fix SQL test warnings.

2016-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0d51b6044 -> 75dc29620 [SPARK-13401][SQL][TESTS] Fix SQL test warnings. ## What changes were proposed in this pull request? This fix tries to fix several SQL test warnings under the sql/core/src/test directory. The fixed warnings

spark git commit: [SPARK-14072][CORE] Show JVM/OS version information when we run a benchmark program

2016-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4700adb98 -> 0d51b6044 [SPARK-14072][CORE] Show JVM/OS version information when we run a benchmark program ## What changes were proposed in this pull request? This PR allows us to identify what JVM is used when someone ran a benchmark

spark git commit: [SPARK-14060][SQL] Move StringToColumn implicit class into SQLImplicits

2016-03-22 Thread rxin
<r...@databricks.com> Author: Wenchen Fan <wenc...@databricks.com> Closes #11878 from rxin/SPARK-14060. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2b1ad7d Tree: http://git-wip-us.apache.org/repos/asf/spar

spark git commit: [SPARK-14063][SQL] SQLContext.range should return Dataset[java.lang.Long]

2016-03-22 Thread rxin
. I also added a new test case in DatasetSuite for range. Author: Reynold Xin <r...@databricks.com> Closes #11880 from rxin/SPARK-14063. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/297c2022 Tree: http://git-wip-us.a

spark git commit: [SPARK-14038][SQL] enable native view by default

2016-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8193a266b -> 14464cadb [SPARK-14038][SQL] enable native view by default ## What changes were proposed in this pull request? As we have completed the `SQLBuilder`, we can safely turn on native view by default. ## How was this patch

spark git commit: [SPARK-14058][PYTHON] Incorrect docstring in Window.order

2016-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8014a516d -> 8193a266b [SPARK-14058][PYTHON] Incorrect docstring in Window.order ## What changes were proposed in this pull request? Replaces current docstring ("Creates a :class:`WindowSpec` with the partitioning defined.") with

spark git commit: [SPARK-14058][PYTHON] Incorrect docstring in Window.order

2016-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 022e06d18 -> f9221ad79 [SPARK-14058][PYTHON] Incorrect docstring in Window.order ## What changes were proposed in this pull request? Replaces current docstring ("Creates a :class:`WindowSpec` with the partitioning defined.") with

spark git commit: [SPARK-13898][SQL] Merge DatasetHolder and DataFrameHolder

2016-03-21 Thread rxin
set are now one class. In addition, fixed some minor issues with pull request #11732. ## How was this patch tested? Updated existing unit tests that test these implicits. Author: Reynold Xin <r...@databricks.com> Closes #11737 from rxin/SPARK-13898. Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-13916][SQL] Add a metric to WholeStageCodegen to measure duration.

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1af8de200 -> 5e86e9262 [SPARK-13916][SQL] Add a metric to WholeStageCodegen to measure duration. ## What changes were proposed in this pull request? WholeStageCodegen naturally breaks the execution into pipelines that are easier to

spark git commit: [SPARK-14004][FOLLOW-UP] Implementations of NonSQLExpression should not override sql method

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master f35df7d18 -> f3717fc7c [SPARK-14004][FOLLOW-UP] Implementations of NonSQLExpression should not override sql method ## What changes were proposed in this pull request? There is only one exception: `PythonUDF`. However, I don't think the

spark git commit: [SPARK-13826][SQL] Ad-hoc Dataset API ScalaDoc fixes

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master a2a907802 -> 060a28c63 [SPARK-13826][SQL] Ad-hoc Dataset API ScalaDoc fixes ## What changes were proposed in this pull request? Ad-hoc Dataset API ScalaDoc fixes ## How was this patch tested? By building and checking ScalaDoc locally.

spark git commit: [SPARK-14039][SQL][MINOR] make SubqueryHolder an inner class

2016-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master df61fbd97 -> a2a907802 [SPARK-14039][SQL][MINOR] make SubqueryHolder an inner class ## What changes were proposed in this pull request? `SubqueryHolder` is only used when generate SQL string in `SQLBuilder`, it's more clear to make it an

spark git commit: [SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x

2016-03-20 Thread rxin
Repository: spark Updated Branches: refs/heads/master 27e1f3885 -> 4ce2d24e2 [SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July

spark git commit: [SPARK-13826][SQL] Addendum: update documentation for Datasets

2016-03-19 Thread rxin
ion for exchange/broadcast. ## How was this patch tested? Just documentation/api stability update. Author: Reynold Xin <r...@databricks.com> Closes #11814 from rxin/dataset-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit

spark git commit: [SPARK-13897][SQL] RelationalGroupedDataset and KeyValueGroupedDataset

2016-03-19 Thread rxin
now to stabilize it. ## How was this patch tested? This is a rename to improve API understandability. Should be covered by all existing tests. Author: Reynold Xin <r...@databricks.com> Closes #11841 from rxin/SPARK-13897. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-13826][SQL] Revises Dataset ScalaDoc

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master 90a1d8db7 -> 10ef4f3e7 [SPARK-13826][SQL] Revises Dataset ScalaDoc ## What changes were proposed in this pull request? This PR revises Dataset API ScalaDoc. All public methods are divided into the following groups * `groupname basic`:

spark git commit: [MINOR][SQL][BUILD] Remove duplicated lines

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7eef2463a -> c890c359b [MINOR][SQL][BUILD] Remove duplicated lines ## What changes were proposed in this pull request? This PR removes three minor duplicated lines. First one is making the following unreachable code warning. ```

spark git commit: [SPARK-13118][SQL] Expression encoding for optional synthetic classes

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master c100d31dd -> 7eef2463a [SPARK-13118][SQL] Expression encoding for optional synthetic classes ## What changes were proposed in this pull request? Fix expression generation for optional types. Standard Java reflection causes issues when

spark git commit: [SPARK-13948] MiMa check should catch if the visibility changes to private

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5faba9fac -> 82066a166 [SPARK-13948] MiMa check should catch if the visibility changes to private MiMa excludes are currently generated using both the current Spark version's classes and Spark 1.2.0's classes, but this doesn't make sense:

spark git commit: [SPARK-13894][SQL] SqlContext.range return type from DataFrame to DataSet

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master d9e8f26d0 -> d9670f847 [SPARK-13894][SQL] SqlContext.range return type from DataFrame to DataSet ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13894 Change the return type of the

spark git commit: [SPARK-13816][GRAPHX] Add parameter checks for algorithms in Graphx

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master d9670f847 -> 91984978e [SPARK-13816][GRAPHX] Add parameter checks for algorithms in Graphx JIRA: https://issues.apache.org/jira/browse/SPARK-13816 ## What changes were proposed in this pull request? Add parameter checks for algorithms in

spark git commit: [SPARK-14018][SQL] Use 64-bit num records in BenchmarkWholeStageCodegen

2016-03-19 Thread rxin
increase this to 500L << 23 and got negative numbers instead. ## How was this patch tested? I'm only modifying test code. Author: Reynold Xin <r...@databricks.com> Closes #11839 from rxin/SPARK-14018. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apach

spark git commit: [SPARK-13926] Automatically use Kryo serializer when shuffling RDDs with simple types

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master d1c193a2f -> de1a84e56 [SPARK-13926] Automatically use Kryo serializer when shuffling RDDs with simple types Because ClassTags are available when constructing ShuffledRDD we can use them to automatically use Kryo for shuffle

spark git commit: [SPARK-13924][SQL] officially support multi-insert

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master eacd9d8ed -> d9e8f26d0 [SPARK-13924][SQL] officially support multi-insert ## What changes were proposed in this pull request? There is a feature of hive SQL called multi-insert. For example: ``` FROM src INSERT OVERWRITE TABLE dest1

spark git commit: [SPARK-14012][SQL] Extract VectorizedColumnReader from VectorizedParquetRecordReader

2016-03-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master c11ea2e41 -> b39594472 [SPARK-14012][SQL] Extract VectorizedColumnReader from VectorizedParquetRecordReader ## What changes were proposed in this pull request? This is a minor followup on https://github.com/apache/spark/pull/11799 that

spark git commit: [SPARK-13403][SQL] Pass hadoopConfiguration to HiveConf constructors.

2016-03-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master de1a84e56 -> 5faba9fac [SPARK-13403][SQL] Pass hadoopConfiguration to HiveConf constructors. This commit updates the HiveContext so that sc.hadoopConfiguration is used to instantiate its internal instances of HiveConf. I tested this by

spark git commit: [MINOR][DOCS] Update build descriptions and commands

2016-03-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master f43a26ef9 -> c11ea2e41 [MINOR][DOCS] Update build descriptions and commands ## What changes were proposed in this pull request? This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark`

spark git commit: [SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test

2016-03-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master c890c359b -> d1c193a2f [SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should

spark git commit: [SPARK-13899][SQL] Produce InternalRow instead of external Row at CSV data source

2016-03-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3c578c594 -> 92024797a [SPARK-13899][SQL] Produce InternalRow instead of external Row at CSV data source ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13899 This PR makes CSV data source

spark git commit: [SPARK-13920][BUILD] MIMA checks should apply to @Experimental and @DeveloperAPI APIs

2016-03-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3665294d4 -> 3c578c594 [SPARK-13920][BUILD] MIMA checks should apply to @Experimental and @DeveloperAPI APIs ## What changes were proposed in this pull request? We are able to change `Experimental` and `DeveloperAPI` API freely but also

spark git commit: [MINOR][TEST][SQL] Remove wrong "expected" parameter in checkNaNWithoutCodegen

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master bbd887f53 -> 52b6a899b [MINOR][TEST][SQL] Remove wrong "expected" parameter in checkNaNWithoutCodegen ## What changes were proposed in this pull request? Remove the wrong "expected" parameter in MathFunctionsSuite.scala's

spark git commit: [SPARK-13918][SQL] Merge SortMergeJoin and SortMergerOuterJoin

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 643649dcb -> bbd887f53 [SPARK-13918][SQL] Merge SortMergeJoin and SortMergerOuterJoin ## What changes were proposed in this pull request? This PR just move some code from SortMergeOuterJoin into SortMergeJoin. This is for support codegen

spark git commit: [SPARK-13896][SQL][STRING] Dataset.toJSON should return Dataset

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master d89c71417 -> 50e3644d0 [SPARK-13896][SQL][STRING] Dataset.toJSON should return Dataset ## What changes were proposed in this pull request? Change the return type of toJson in Dataset class ## How was this patch tested? No additional unit

spark git commit: [SPARK-13893][SQL] Remove SQLContext.catalog/analyzer (internal method)

2016-03-15 Thread rxin
her than having an internal field. ## How was this patch tested? Existing unit/integration test code. Author: Reynold Xin <r...@databricks.com> Closes #11716 from rxin/SPARK-13893. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-13660][SQL][TESTS] ContinuousQuerySuite floods the logs with garbage

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 99bd2f0e9 -> 10251a745 [SPARK-13660][SQL][TESTS] ContinuousQuerySuite floods the logs with garbage ## What changes were proposed in this pull request? Use method 'testQuietly' to avoid ContinuousQuerySuite flooding the console logs with

spark git commit: [SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning and EliminateOperator

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 276c2d51a -> 99bd2f0e9 [SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning and EliminateOperator What changes were proposed in this pull request? Before this PR, two Optimizer rules `ColumnPruning` and

spark git commit: [SPARK-13890][SQL] Remove some internal classes' dependency on SQLContext

2016-03-15 Thread rxin
How was this patch tested? Existing unit/integration tests. Author: Reynold Xin <r...@databricks.com> Closes #11712 from rxin/sqlContext-planner. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/276c2d51 Tree: http:

spark git commit: [SPARK-13207][SQL][BRANCH-1.6] Make partitioning discovery ignore _SUCCESS files.

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 589d0420a -> 6935b5080 [SPARK-13207][SQL][BRANCH-1.6] Make partitioning discovery ignore _SUCCESS files. If a _SUCCESS appears in the inner partitioning dir, partition discovery will treat that _SUCCESS file as a data file. Then,

spark git commit: [SPARK-13870][SQL] Add scalastyle escaping correctly in CVSSuite.scala

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 43304b175 -> a51f877b5 [SPARK-13870][SQL] Add scalastyle escaping correctly in CVSSuite.scala ## What changes were proposed in this pull request? When initial creating `CVSSuite.scala` in SPARK-12833, there was a typo on `scalastyle:on`:

spark git commit: [SPARK-13888][DOC] Remove Akka Receiver doc and refer to the DStream Akka project

2016-03-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master e64958001 -> 43304b175 [SPARK-13888][DOC] Remove Akka Receiver doc and refer to the DStream Akka project ## What changes were proposed in this pull request? I have copied the docs of Streaming Akka to

spark git commit: [SPARK-13884][SQL] Remove DescribeCommand's dependency on LogicalPlan

2016-03-15 Thread rxin
<r...@databricks.com> Closes #11710 from rxin/SPARK-13884. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e6495800 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e6495800 Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-13353][SQL] fast serialization for collecting DataFrame/Dataset

2016-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9256840cb -> f72743d97 [SPARK-13353][SQL] fast serialization for collecting DataFrame/Dataset ## What changes were proposed in this pull request? When we call DataFrame/Dataset.collect(), Java serializer (or Kryo Serializer) will be used

spark git commit: [SPARK-13661][SQL] avoid the copy in HashedRelation

2016-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master e76679a81 -> 9256840cb [SPARK-13661][SQL] avoid the copy in HashedRelation ## What changes were proposed in this pull request? Avoid the copy in HashedRelation, since most of the HashedRelation are built with Array[Row], added the copy()

[1/2] spark git commit: [SPARK-13882][SQL] Remove org.apache.spark.sql.execution.local

2016-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 17eec0a71 -> 4bf460979 http://git-wip-us.apache.org/repos/asf/spark/blob/4bf46097/sql/core/src/test/scala/org/apache/spark/sql/execution/local/TakeOrderedAndProjectNodeSuite.scala

[1/5] spark git commit: [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8301fadd8 -> 06dec3745 http://git-wip-us.apache.org/repos/asf/spark/blob/06dec374/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py

[5/5] spark git commit: [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread rxin
[SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages ## What changes were proposed in this pull request? Currently there are a few sub-projects, each for integrating with different external sources for Streaming.

[3/5] spark git commit: [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/06dec374/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala -- diff --git

[4/5] spark git commit: [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/06dec374/external/akka/src/main/scala/org/apache/spark/streaming/akka/AkkaUtils.scala -- diff --git a/external/akka/src/main/scala/org/apache/spark/streaming/akka/AkkaUtils.scala

<    11   12   13   14   15   16   17   18   19   20   >