[1/2] spark git commit: [SPARK-15633][MINOR] Make package name for Java tests consistent

2016-05-27 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 3801fb4f3 -> ada319844 http://git-wip-us.apache.org/repos/asf/spark/blob/ada31984/external/java8-tests/src/test/java/test/org/apache/spark/java8/dstream/Java8APISuite.java -

[2/2] spark git commit: [SPARK-15633][MINOR] Make package name for Java tests consistent

2016-05-27 Thread rxin
added "java8" as the package name so we can easily run all the tests related to Java 8. ## How was this patch tested? This is a test only change. Author: Reynold Xin Closes #13364 from rxin/SPARK-15633. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http

[2/2] spark git commit: [SPARK-15633][MINOR] Make package name for Java tests consistent

2016-05-27 Thread rxin
added "java8" as the package name so we can easily run all the tests related to Java 8. ## How was this patch tested? This is a test only change. Author: Reynold Xin Closes #13364 from rxin/SPARK-15633. (cherry picked from commit 73178c75565e20f53e6ee1478f3d976732c64438) Signed-off-b

spark git commit: [SPARK-15553][SQL] Dataset.createTempView should use CreateViewCommand

2016-05-27 Thread rxin
Repository: spark Updated Branches: refs/heads/master 73178c755 -> f1b220eee [SPARK-15553][SQL] Dataset.createTempView should use CreateViewCommand ## What changes were proposed in this pull request? Let `Dataset.createTempView` and `Dataset.createOrReplaceTempView` use `CreateViewCommand`,

spark git commit: [SPARK-15553][SQL] Dataset.createTempView should use CreateViewCommand

2016-05-27 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 ada319844 -> 36045106d [SPARK-15553][SQL] Dataset.createTempView should use CreateViewCommand ## What changes were proposed in this pull request? Let `Dataset.createTempView` and `Dataset.createOrReplaceTempView` use `CreateViewComman

spark git commit: [SPARK-15638][SQL] Audit Dataset, SparkSession, and SQLContext

2016-05-30 Thread rxin
and SQLContext. The patch audits the categorization of experimental APIs, function groups, and deprecations. For the detailed list of changes, please see the diff. ## How was this patch tested? N/A Author: Reynold Xin Closes #13370 from rxin/SPARK-15638. Project: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-15638][SQL] Audit Dataset, SparkSession, and SQLContext

2016-05-30 Thread rxin
and SQLContext. The patch audits the categorization of experimental APIs, function groups, and deprecations. For the detailed list of changes, please see the diff. ## How was this patch tested? N/A Author: Reynold Xin Closes #13370 from rxin/SPARK-15638. (cherry picked from com

spark git commit: [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec

2016-05-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master 95db8a44f -> 2bfc4f152 [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec ## What changes were proposed in this pull request? in HiveTableScanExec, schema is lazy and is related with relation.attributeMap. So it n

spark git commit: [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec

2016-05-31 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 2e3ead20c -> e11046457 [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec ## What changes were proposed in this pull request? in HiveTableScanExec, schema is lazy and is related with relation.attributeMap. So

spark git commit: [SPARK-15680][SQL] Disable comments in generated code in order to avoid perf. issues

2016-05-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master 223f1d58c -> 8ca01a6fe [SPARK-15680][SQL] Disable comments in generated code in order to avoid perf. issues ## What changes were proposed in this pull request? In benchmarks involving tables with very wide and complex schemas (thousands o

spark git commit: [SPARK-15680][SQL] Disable comments in generated code in order to avoid perf. issues

2016-05-31 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 978f54e76 -> f0e8738c1 [SPARK-15680][SQL] Disable comments in generated code in order to avoid perf. issues ## What changes were proposed in this pull request? In benchmarks involving tables with very wide and complex schemas (thousan

spark git commit: [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section

2016-06-01 Thread rxin
Repository: spark Updated Branches: refs/heads/master 07a98ca4c -> 2402b9146 [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section ## What changes were proposed in this pull request? Update document programming-guide accumulator section (scala language) java and

spark git commit: [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section

2016-06-01 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 beb4ea0b4 -> 47902d4bc [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section ## What changes were proposed in this pull request? Update document programming-guide accumulator section (scala language) java

spark git commit: [SPARK-14752][SQL] Explicitly implement KryoSerialization for LazilyGenerateOrdering

2016-06-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7c07d176f -> 09b3c56c9 [SPARK-14752][SQL] Explicitly implement KryoSerialization for LazilyGenerateOrdering ## What changes were proposed in this pull request? This patch fixes a number of `com.esotericsoftware.kryo.KryoException: java.l

spark git commit: [SPARK-14752][SQL] Explicitly implement KryoSerialization for LazilyGenerateOrdering

2016-06-02 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 18d613a4d -> 841523cdc [SPARK-14752][SQL] Explicitly implement KryoSerialization for LazilyGenerateOrdering ## What changes were proposed in this pull request? This patch fixes a number of `com.esotericsoftware.kryo.KryoException: ja

[1/2] spark git commit: [SPARK-15728][SQL] Rename aggregate operators: HashAggregate and SortAggregate

2016-06-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 09b3c56c9 -> 8900c8d8f http://git-wip-us.apache.org/repos/asf/spark/blob/8900c8d8/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala --

[2/2] spark git commit: [SPARK-15728][SQL] Rename aggregate operators: HashAggregate and SortAggregate

2016-06-02 Thread rxin
view. This patch renames them HashAggregate and SortAggregate. ## How was this patch tested? Updated test cases. Author: Reynold Xin Closes #13465 from rxin/SPARK-15728. (cherry picked from commit 8900c8d8ff1614b5ec5a2ce213832fa13462b4d4) Signed-off-by: Reynold Xin Project: http://gi

[1/2] spark git commit: [SPARK-15728][SQL] Rename aggregate operators: HashAggregate and SortAggregate

2016-06-02 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 841523cdc -> cd7bf4b8e http://git-wip-us.apache.org/repos/asf/spark/blob/cd7bf4b8/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala --

[2/2] spark git commit: [SPARK-15728][SQL] Rename aggregate operators: HashAggregate and SortAggregate

2016-06-02 Thread rxin
view. This patch renames them HashAggregate and SortAggregate. ## How was this patch tested? Updated test cases. Author: Reynold Xin Closes #13465 from rxin/SPARK-15728. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/890

spark git commit: [SPARK-15745][SQL] Use classloader's getResource() for reading resource files in HiveTests

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 76aa45d35 -> f7288e166 [SPARK-15745][SQL] Use classloader's getResource() for reading resource files in HiveTests ## What changes were proposed in this pull request? This is a cleaner approach in general but my motivation behind this chan

spark git commit: [SPARK-15745][SQL] Use classloader's getResource() for reading resource files in HiveTests

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 1e13d09c5 -> 306601282 [SPARK-15745][SQL] Use classloader's getResource() for reading resource files in HiveTests ## What changes were proposed in this pull request? This is a cleaner approach in general but my motivation behind this

spark git commit: [SPARK-15744][SQL] Rename two TungstenAggregation*Suites and update codgen/error messages/comments

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master f7288e166 -> b9fcfb3bd [SPARK-15744][SQL] Rename two TungstenAggregation*Suites and update codgen/error messages/comments ## What changes were proposed in this pull request? For consistency, this PR updates some remaining `TungstenAggreg

spark git commit: [SPARK-15744][SQL] Rename two TungstenAggregation*Suites and update codgen/error messages/comments

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 306601282 -> 3a9ee549c [SPARK-15744][SQL] Rename two TungstenAggregation*Suites and update codgen/error messages/comments ## What changes were proposed in this pull request? For consistency, this PR updates some remaining `TungstenAg

spark git commit: [SPARK-15756][SQL] Support command 'create table stored as orcfile/parquetfile/avrofile'

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 a2540b936 -> cf8782116 [SPARK-15756][SQL] Support command 'create table stored as orcfile/parquetfile/avrofile' ## What changes were proposed in this pull request? Now Spark SQL can support 'create table src stored as orc/parquet/avro'

spark git commit: [SPARK-15756][SQL] Support command 'create table stored as orcfile/parquetfile/avrofile'

2016-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 61d729abd -> 2ca563cc4 [SPARK-15756][SQL] Support command 'create table stored as orcfile/parquetfile/avrofile' ## What changes were proposed in this pull request? Now Spark SQL can support 'create table src stored as orc/parquet/avro' for

spark git commit: [SPARK-15770][ML] Annotation audit for Experimental and DeveloperApi

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4e767d0f9 -> 372fa61f5 [SPARK-15770][ML] Annotation audit for Experimental and DeveloperApi ## What changes were proposed in this pull request? 1, remove comments `:: Experimental ::` for non-experimental API 2, add comments `:: Experimenta

spark git commit: [SPARK-15770][ML] Annotation audit for Experimental and DeveloperApi

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 8c0ec85e6 -> 1ece135b9 [SPARK-15770][ML] Annotation audit for Experimental and DeveloperApi ## What changes were proposed in this pull request? 1, remove comments `:: Experimental ::` for non-experimental API 2, add comments `:: Experim

spark git commit: [SPARK-15748][SQL] Replace inefficient foldLeft() call with flatMap() in PartitionStatistics

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 30c4774f3 -> 26c1089c3 [SPARK-15748][SQL] Replace inefficient foldLeft() call with flatMap() in PartitionStatistics `PartitionStatistics` uses `foldLeft` and list concatenation (`++`) to flatten an iterator of lists, but this is extremely

spark git commit: [SPARK-15748][SQL] Replace inefficient foldLeft() call with flatMap() in PartitionStatistics

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 38a626a54 -> d8370ef11 [SPARK-15748][SQL] Replace inefficient foldLeft() call with flatMap() in PartitionStatistics `PartitionStatistics` uses `foldLeft` and list concatenation (`++`) to flatten an iterator of lists, but this is extre

spark git commit: [SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 790de600b -> 9e7e2f916 [SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour ## What changes were proposed in this pull request? This pr fixes the behaviour of `format("csv").option("quote", null)` along with one of spa

spark git commit: [SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 79268aa46 -> b7e8d1cb3 [SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour ## What changes were proposed in this pull request? This pr fixes the behaviour of `format("csv").option("quote", null)` along with one of spark-c

spark git commit: Revert "[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour"

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master b7e8d1cb3 -> 32f2f95db Revert "[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour" This reverts commit b7e8d1cb3ce932ba4a784be59744af8a8ef027ce. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://

spark git commit: Revert "[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour"

2016-06-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 9e7e2f916 -> 7d10e4bdd Revert "[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour" This reverts commit 9e7e2f9164e0b3bd555e795b871626057b4fed31. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: htt

spark git commit: [MINOR][DOC] In Dataset docs, remove self link to Dataset and add link to Column

2016-06-08 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 48239b5f1 -> 96c011d5b [MINOR][DOC] In Dataset docs, remove self link to Dataset and add link to Column ## What changes were proposed in this pull request? Documentation Fix ## How was this patch tested? Author: Sandeep Singh Closes

spark git commit: [MINOR][DOC] In Dataset docs, remove self link to Dataset and add link to Column

2016-06-08 Thread rxin
Repository: spark Updated Branches: refs/heads/master afbe35cf5 -> d5807def1 [MINOR][DOC] In Dataset docs, remove self link to Dataset and add link to Column ## What changes were proposed in this pull request? Documentation Fix ## How was this patch tested? Author: Sandeep Singh Closes #13

spark git commit: [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 0408793aa -> b42e3d886 [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions ## What changes were proposed in this pull request? The current implementations of `UnixTime` and `FromUnixTime` do not cache t

spark git commit: [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6cb71f473 -> b0768538e [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions ## What changes were proposed in this pull request? The current implementations of `UnixTime` and `FromUnixTime` do not cache their

spark git commit: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 d45aa50fc -> ebbbf2136 [SPARK-15791] Fix NPE in ScalarSubquery ## What changes were proposed in this pull request? The fix is pretty simple, just don't make the executedPlan transient in `ScalarSubquery` since it is referenced at exec

spark git commit: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master 16df133d7 -> 6c5fd977f [SPARK-15791] Fix NPE in ScalarSubquery ## What changes were proposed in this pull request? The fix is pretty simple, just don't make the executedPlan transient in `ScalarSubquery` since it is referenced at executio

spark git commit: [SPARK-15696][SQL] Improve `crosstab` to have a consistent column order

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 ebbbf2136 -> 1371d5ece [SPARK-15696][SQL] Improve `crosstab` to have a consistent column order ## What changes were proposed in this pull request? Currently, `crosstab` returns a Dataframe having **random-order** columns obtained by j

spark git commit: [SPARK-15696][SQL] Improve `crosstab` to have a consistent column order

2016-06-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6c5fd977f -> 5a3533e77 [SPARK-15696][SQL] Improve `crosstab` to have a consistent column order ## What changes were proposed in this pull request? Currently, `crosstab` returns a Dataframe having **random-order** columns obtained by just

spark git commit: [DOCUMENTATION] fixed groupby aggregation example for pyspark

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 00c310133 -> 675a73715 [DOCUMENTATION] fixed groupby aggregation example for pyspark ## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the existin

spark git commit: [DOCUMENTATION] fixed groupby aggregation example for pyspark

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 02ed7b536 -> 84a8421e5 [DOCUMENTATION] fixed groupby aggregation example for pyspark ## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the exi

spark git commit: [DOCUMENTATION] fixed groupby aggregation example for pyspark

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 739d992f0 -> 393f4ba15 [DOCUMENTATION] fixed groupby aggregation example for pyspark ## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the exi

spark git commit: [SPARK-15866] Rename listAccumulator collectionAccumulator

2016-06-10 Thread rxin
erb and the method should return a list of accumulators. This patch renames the method and the class collection accumulator. ## How was this patch tested? Updated test case to reflect the names. Author: Reynold Xin Closes #13594 from rxin/SPARK-15866. Project: http://git-wip-us.apache.org/re

spark git commit: [SPARK-15866] Rename listAccumulator collectionAccumulator

2016-06-10 Thread rxin
is a verb and the method should return a list of accumulators. This patch renames the method and the class collection accumulator. ## How was this patch tested? Updated test case to reflect the names. Author: Reynold Xin Closes #13594 from rxin/SPARK-15866. (cherry

spark git commit: [MINOR][X][X] Replace all occurrences of None: Option with Option.empty

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 f15d641e2 -> 96bb1476c [MINOR][X][X] Replace all occurrences of None: Option with Option.empty ## What changes were proposed in this pull request? Replace all occurrences of `None: Option[X]` with `Option.empty[X]` ## How was this patc

spark git commit: [MINOR][X][X] Replace all occurrences of None: Option with Option.empty

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 667d4ea7b -> 865ec32dd [MINOR][X][X] Replace all occurrences of None: Option with Option.empty ## What changes were proposed in this pull request? Replace all occurrences of `None: Option[X]` with `Option.empty[X]` ## How was this patch te

spark git commit: [SPARK-15875] Try to use Seq.isEmpty and Seq.nonEmpty instead of Seq.length == 0 and Seq.length > 0

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 865ec32dd -> 026eb9064 [SPARK-15875] Try to use Seq.isEmpty and Seq.nonEmpty instead of Seq.length == 0 and Seq.length > 0 ## What changes were proposed in this pull request? In scala, immutable.List.length is an expensive operation so we

spark git commit: [SPARK-15875] Try to use Seq.isEmpty and Seq.nonEmpty instead of Seq.length == 0 and Seq.length > 0

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 96bb1476c -> 8b6742a37 [SPARK-15875] Try to use Seq.isEmpty and Seq.nonEmpty instead of Seq.length == 0 and Seq.length > 0 ## What changes were proposed in this pull request? In scala, immutable.List.length is an expensive operation s

spark git commit: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 e6ebb547b -> f41f433b1 [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible ## What changes were proposed in this pull request? Instead of using local variable `sc` like the following example, this P

spark git commit: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible

2016-06-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 127a6678d -> 2022afe57 [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible ## What changes were proposed in this pull request? Instead of using local variable `sc` like the following example, this PR us

spark git commit: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master ad102af16 -> cb5d933d8 [SPARK-15585][SQL] Add doc for turning off quotations ## What changes were proposed in this pull request? This pr is to add doc for turning off quotations because this behavior is different from `com.databricks.spark

spark git commit: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 8cf33fb8a -> 4c7b208ab [SPARK-15585][SQL] Add doc for turning off quotations ## What changes were proposed in this pull request? This pr is to add doc for turning off quotations because this behavior is different from `com.databricks.s

spark git commit: [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master cb5d933d8 -> 5bb4564cd [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark ## What changes were proposed in this pull request? These were not updated after performance improvements. To make updating them easier, I also mov

spark git commit: [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 4c7b208ab -> 304ec5de3 [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark ## What changes were proposed in this pull request? These were not updated after performance improvements. To make updating them easier, I also

spark git commit: [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5bb4564cd -> 75705e8db [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range ## What changes were proposed in this pull request? It's easy for users to call `range(...).as[Long]` to get typed Dataset, and don't worth an

spark git commit: [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 304ec5de3 -> 0cf31f0c8 [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range ## What changes were proposed in this pull request? It's easy for users to call `range(...).as[Long]` to get typed Dataset, and don't worth

spark git commit: [SPARK-14851][CORE] Support radix sort with nullable longs

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master 75705e8db -> c06c58bbb [SPARK-14851][CORE] Support radix sort with nullable longs ## What changes were proposed in this pull request? This adds support for radix sort of nullable long fields. When a sort field is null and radix sort is en

spark git commit: [SPARK-14851][CORE] Support radix sort with nullable longs

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 0cf31f0c8 -> beb753004 [SPARK-14851][CORE] Support radix sort with nullable longs ## What changes were proposed in this pull request? This adds support for radix sort of nullable long fields. When a sort field is null and radix sort i

spark git commit: [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master c06c58bbb -> 3fd2ff4dd [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame ## What changes were proposed in this pull request? This PR adds `varargs`-types `dropDuplicates` functions in `Dataset/DataFrame`. Currently

spark git commit: [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 beb753004 -> 7e2bfff20 [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame ## What changes were proposed in this pull request? This PR adds `varargs`-types `dropDuplicates` functions in `Dataset/DataFrame`. Curre

spark git commit: Revert "[SPARK-14851][CORE] Support radix sort with nullable longs"

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 7e2bfff20 -> 796dd1514 Revert "[SPARK-14851][CORE] Support radix sort with nullable longs" This reverts commit beb75300455a4f92000b69e740256102d9f2d472. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

spark git commit: [SPARK-15860] Metrics for codegen size and perf

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3fd2ff4dd -> e1f986c7a [SPARK-15860] Metrics for codegen size and perf ## What changes were proposed in this pull request? Adds codahale metrics for the codegen source text size and how long it takes to compile. The size is particularly i

spark git commit: [SPARK-15860] Metrics for codegen size and perf

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 796dd1514 -> ffbc6b796 [SPARK-15860] Metrics for codegen size and perf ## What changes were proposed in this pull request? Adds codahale metrics for the codegen source text size and how long it takes to compile. The size is particular

spark git commit: [SPARK-15840][SQL] Add two missing options in documentation and some option related changes

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master e1f986c7a -> 9e204c62c [SPARK-15840][SQL] Add two missing options in documentation and some option related changes ## What changes were proposed in this pull request? This PR 1. Adds the documentations for some missing options, `inferSch

spark git commit: [SPARK-15840][SQL] Add two missing options in documentation and some option related changes

2016-06-11 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 ffbc6b796 -> d494a483a [SPARK-15840][SQL] Add two missing options in documentation and some option related changes ## What changes were proposed in this pull request? This PR 1. Adds the documentations for some missing options, `infe

spark git commit: [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 50248dcff -> f51dfe616 [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API ## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don'

spark git commit: [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 b75d1c201 -> f703dff0a [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API ## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples -

spark git commit: [SPARK-15876][CORE] Remove support for "zk://" master URL

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 f703dff0a -> 161d02db6 [SPARK-15876][CORE] Remove support for "zk://" master URL ## What changes were proposed in this pull request? Remove deprecated support for `zk://` master (`mesos://zk//` remains supported) ## How was this patch

spark git commit: [SPARK-15876][CORE] Remove support for "zk://" master URL

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master f51dfe616 -> 0a6f09083 [SPARK-15876][CORE] Remove support for "zk://" master URL ## What changes were proposed in this pull request? Remove deprecated support for `zk://` master (`mesos://zk//` remains supported) ## How was this patch tes

spark git commit: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master f5d38c392 -> 1f8f2b5c2 [SPARK-15370][SQL] Fix count bug # What changes were proposed in this pull request? This pull request fixes the COUNT bug in the `RewriteCorrelatedScalarSubquery` rule. After this change, the rule tests the expressi

spark git commit: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 8b6ec9b91 -> 94482b1e4 [SPARK-15370][SQL] Fix count bug # What changes were proposed in this pull request? This pull request fixes the COUNT bug in the `RewriteCorrelatedScalarSubquery` rule. After this change, the rule tests the expr

spark git commit: [SPARK-15898][SQL] DataFrameReader.text should return DataFrame

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 94482b1e4 -> b96e7f6aa [SPARK-15898][SQL] DataFrameReader.text should return DataFrame ## What changes were proposed in this pull request? We want to maintain API compatibility for DataFrameReader.text, and will introduce a new API ca

spark git commit: [SPARK-15898][SQL] DataFrameReader.text should return DataFrame

2016-06-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1f8f2b5c2 -> e2ab79d5e [SPARK-15898][SQL] DataFrameReader.text should return DataFrame ## What changes were proposed in this pull request? We want to maintain API compatibility for DataFrameReader.text, and will introduce a new API called

spark git commit: [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions

2016-06-13 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 2841bbac4 -> 974be6241 [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions ## What changes were proposed in this pull request? In our encoder framework, we imply that serializer expressions should use `Boun

spark git commit: [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions

2016-06-13 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1842cdd4e -> 688b6ef9d [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions ## What changes were proposed in this pull request? In our encoder framework, we imply that serializer expressions should use `BoundRef

spark git commit: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations' in hive StatisticsSuite

2016-06-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 214adb14b -> 0bd86c0fe [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations' in hive StatisticsSuite ## What changes were proposed in this pull request? This test re-enables the `analyze MetastoreRelations` in `org.apache.spark.sql.hi

spark git commit: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations' in hive StatisticsSuite

2016-06-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 96274d73e -> 1259a6fa8 [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations' in hive StatisticsSuite ## What changes were proposed in this pull request? This test re-enables the `analyze MetastoreRelations` in `org.apache.spark.sq

spark git commit: [SPARK-15952][SQL] fix "show databases" ordering issue

2016-06-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 1259a6fa8 -> b75542603 [SPARK-15952][SQL] fix "show databases" ordering issue ## What changes were proposed in this pull request? Two issues I've found for "show databases" command: 1. The returned database name list was not sorted, i

spark git commit: [SPARK-15952][SQL] fix "show databases" ordering issue

2016-06-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0bd86c0fe -> 42a28caf1 [SPARK-15952][SQL] fix "show databases" ordering issue ## What changes were proposed in this pull request? Two issues I've found for "show databases" command: 1. The returned database name list was not sorted, it on

spark git commit: [SPARK-15960][SQL] Rename `spark.sql.enableFallBackToHdfsForStats` config

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 7a0ed75ea -> 5c53442cc [SPARK-15960][SQL] Rename `spark.sql.enableFallBackToHdfsForStats` config ## What changes were proposed in this pull request? Since we are probably going to add more statistics related configurations in the futur

spark git commit: [SPARK-15960][SQL] Rename `spark.sql.enableFallBackToHdfsForStats` config

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 40eeef952 -> de99c3d08 [SPARK-15960][SQL] Rename `spark.sql.enableFallBackToHdfsForStats` config ## What changes were proposed in this pull request? Since we are probably going to add more statistics related configurations in the future, I

spark git commit: [SPARK-15959][SQL] Add the support of hive.metastore.warehouse.dir back

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9a5071996 -> e1585cc74 [SPARK-15959][SQL] Add the support of hive.metastore.warehouse.dir back ## What changes were proposed in this pull request? This PR adds the support of conf `hive.metastore.warehouse.dir` back. With this patch, the w

spark git commit: [SPARK-15959][SQL] Add the support of hive.metastore.warehouse.dir back

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 885e74a38 -> eb1d746c4 [SPARK-15959][SQL] Add the support of hive.metastore.warehouse.dir back ## What changes were proposed in this pull request? This PR adds the support of conf `hive.metastore.warehouse.dir` back. With this patch, t

spark git commit: [SPARK-15518][CORE][FOLLOW-UP] Rename LocalSchedulerBackendEndpoint -> LocalSchedulerBackend

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 eb1d746c4 -> de56ea9bf [SPARK-15518][CORE][FOLLOW-UP] Rename LocalSchedulerBackendEndpoint -> LocalSchedulerBackend ## What changes were proposed in this pull request? This patch is a follow-up to https://github.com/apache/spark/pull/

spark git commit: [SPARK-15518][CORE][FOLLOW-UP] Rename LocalSchedulerBackendEndpoint -> LocalSchedulerBackend

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master e1585cc74 -> 9b234b55d [SPARK-15518][CORE][FOLLOW-UP] Rename LocalSchedulerBackendEndpoint -> LocalSchedulerBackend ## What changes were proposed in this pull request? This patch is a follow-up to https://github.com/apache/spark/pull/1328

spark git commit: [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.]

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 2c1aae442 -> 73bf87f3c [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang Closes #4 from nirmannarang/SPARK-7848.

spark git commit: [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.]

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master cafc696d0 -> 04d7b3d2b [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang Closes #4 from nirmannarang/SPARK-7848. P

spark git commit: Closing stale pull requests.

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 04d7b3d2b -> 1a33f2e05 Closing stale pull requests. Closes #13103 Closes #8320 Closes #7871 Closes #7461 Closes #9159 Closes #9150 Closes #9200 Closes #9089 Closes #8022 Closes #6767 Closes #8505 Closes #9457 Closes #9397 Closes #8563 Close

spark git commit: [DOCS] Fix Gini and Entropy scaladocs in context of multiclass classification

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master a153e41c0 -> 6e0b3d795 [DOCS] Fix Gini and Entropy scaladocs in context of multiclass classification The PR changes outdated scaladocs for Gini and Entropy classes. Since PR #886 Spark supports multiclass classification, but the docs tell

spark git commit: [DOCS] Fix Gini and Entropy scaladocs in context of multiclass classification

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 9acf254ed -> 382735c41 [DOCS] Fix Gini and Entropy scaladocs in context of multiclass classification The PR changes outdated scaladocs for Gini and Entropy classes. Since PR #886 Spark supports multiclass classification, but the docs t

[1/2] spark git commit: [SPARK-15979][SQL] Rename various Parquet support classes.

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3e6d567a4 -> 865e7cc38 http://git-wip-us.apache.org/repos/asf/spark/blob/865e7cc3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala ---

[2/2] spark git commit: [SPARK-15979][SQL] Rename various Parquet support classes.

2016-06-15 Thread rxin
) classes. 2. We are in the Spark code base, and as a result it'd be more clear to call out these are Parquet support classes, rather than some Spark classes. ## How was this patch tested? Renamed test cases as well. Author: Reynold Xin Closes #13696 from rxin/parquet-rename. Project:

spark git commit: [SPARK-13498][SQL] Increment the recordsRead input metric for JDBC data source

2016-06-15 Thread rxin
and increments the record count for JDBC data source. Closes #11373. ## How was this patch tested? N/A Author: Reynold Xin Closes #13694 from rxin/SPARK-13498. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ebdd7512 Tree: http://

spark git commit: [SPARK-13498][SQL] Increment the recordsRead input metric for JDBC data source

2016-06-15 Thread rxin
and increments the record count for JDBC data source. Closes #11373. ## How was this patch tested? N/A Author: Reynold Xin Closes #13694 from rxin/SPARK-13498. (cherry picked from commit ebdd7512723851934241bd87fe7b25fd60cc58d8) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/re

spark git commit: [SPARK-15851][BUILD] Fix the call of the bash script to enable proper run in Windows

2016-06-15 Thread rxin
ses comments from the code review. Closes #13612 ## How was this patch tested? I built manually (on a Mac) to verify it didn't break Mac compilation. Author: Reynold Xin Author: avulanov Closes #13691 from rxin/SPARK-15851. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Comm

spark git commit: [SPARK-15851][BUILD] Fix the call of the bash script to enable proper run in Windows

2016-06-15 Thread rxin
ses comments from the code review. Closes #13612 ## How was this patch tested? I built manually (on a Mac) to verify it didn't break Mac compilation. Author: Reynold Xin Author: avulanov Closes #13691 from rxin/SPARK-15851. (cherry picked from commit 5a52ba0f952b21818ed73cb253381f6a

spark git commit: [SPARK-15547][SQL] nested case class in encoder can have different number of fields from the real schema

2016-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 cb3bb1901 -> 61738a38a [SPARK-15547][SQL] nested case class in encoder can have different number of fields from the real schema There are 2 kinds of `GetStructField`: 1. resolved from `UnresolvedExtractValue`, and it will have a `name

[3/3] spark git commit: [SPARK-15979][SQL] Rename various Parquet support classes (branch-2.0).

2016-06-16 Thread rxin
(i.e. Catalyst) classes. 2. We are in the Spark code base, and as a result it'd be more clear to call out these are Parquet support classes, rather than some Spark classes. ## How was this patch tested? Renamed test cases as well. Author: Reynold Xin Closes #13700 from rxin/parquet-rename-b

<    1   2   3   4   5   6   7   8   9   10   >