spark git commit: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1da5822e6 -> a8d9ec8a6 [SPARK-21780][R] Simpler Dataset.sample API in R ## What changes were proposed in this pull request? This PR make `sample(...)` able to omit `withReplacement` defaulting to `FALSE`. In short, the following examples

[1/4] spark git commit: [SPARK-17997][SQL] Add an aggregation function for counting distinct values for multiple intervals

2017-09-21 Thread wenchen
Repository: spark Updated Branches: refs/heads/master a8d9ec8a6 -> 1d1a09be9 http://git-wip-us.apache.org/repos/asf/spark/blob/1d1a09be/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervalsSuite.scala

[3/4] spark git commit: [SPARK-17997][SQL] Add an aggregation function for counting distinct values for multiple intervals

2017-09-21 Thread wenchen
http://git-wip-us.apache.org/repos/asf/spark/blob/1d1a09be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/

[4/4] spark git commit: [SPARK-17997][SQL] Add an aggregation function for counting distinct values for multiple intervals

2017-09-21 Thread wenchen
[SPARK-17997][SQL] Add an aggregation function for counting distinct values for multiple intervals ## What changes were proposed in this pull request? This work is a part of [SPARK-17074](https://issues.apache.org/jira/browse/SPARK-17074) to compute equi-height histograms. Equi-height histogra

[2/4] spark git commit: [SPARK-17997][SQL] Add an aggregation function for counting distinct values for multiple intervals

2017-09-21 Thread wenchen
http://git-wip-us.apache.org/repos/asf/spark/blob/1d1a09be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ut

spark git commit: [SPARK-22086][DOCS] Add expression description for CASE WHEN

2017-09-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1d1a09be9 -> 1270e7175 [SPARK-22086][DOCS] Add expression description for CASE WHEN ## What changes were proposed in this pull request? In SQL conditional expressions, only CASE WHEN lacks for expression description. This patch fills the

spark git commit: [SPARK-21977][HOTFIX] Adjust EnsureStatefulOpPartitioningSuite to use scalatest lifecycle normally instead of constructor

2017-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1270e7175 -> f10cbf17d [SPARK-21977][HOTFIX] Adjust EnsureStatefulOpPartitioningSuite to use scalatest lifecycle normally instead of constructor ## What changes were proposed in this pull request? Adjust EnsureStatefulOpPartitioningSuite

spark git commit: [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo

2017-09-21 Thread vanzin
Repository: spark Updated Branches: refs/heads/master f10cbf17d -> b75bd1777 [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo ## What changes were proposed in this pull request? We have to make sure that SerializerManager's private instance of kryo also uses the right c

spark git commit: [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo

2017-09-21 Thread vanzin
Repository: spark Updated Branches: refs/heads/branch-2.2 401ac20d2 -> 765fd92e7 [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo ## What changes were proposed in this pull request? We have to make sure that SerializerManager's private instance of kryo also uses the rig

spark git commit: [INFRA] Close stale PRs.

2017-09-21 Thread vanzin
Repository: spark Updated Branches: refs/heads/master b75bd1777 -> f7ad0dbd5 [INFRA] Close stale PRs. Closes #19296 Closes #19291 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f7ad0dbd Tree: http://git-wip-us.apache.org

spark git commit: [SPARK-22088][SQL] Incorrect scalastyle comment causes wrong styles in stringExpressions

2017-09-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f7ad0dbd5 -> 9cac249fd [SPARK-22088][SQL] Incorrect scalastyle comment causes wrong styles in stringExpressions ## What changes were proposed in this pull request? There is an incorrect `scalastyle:on` comment in `stringExpressions.scala`

spark git commit: [SPARK-22075][ML] GBTs unpersist datasets cached by Checkpointer

2017-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 9cac249fd -> b21b806ec [SPARK-22075][ML] GBTs unpersist datasets cached by Checkpointer ## What changes were proposed in this pull request? `PeriodicRDDCheckpointer` will automatically persist the last 3 datasets called by `PeriodicRDDChec

spark git commit: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master b21b806ec -> a8a5cd24e [SPARK-22009][ML] Using treeAggregate improve some algs ## What changes were proposed in this pull request? I test on a dataset of about 13M instances, and found that using `treeAggregate` give a speedup in followin

[1/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master a8a5cd24e -> f32a84250 http://git-wip-us.apache.org/repos/asf/spark/blob/f32a8425/sql/core/src/test/scala/org/apache/spark/sql/streaming/StateStoreMetricsTest.scala -- diff

[2/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
[SPARK-22053][SS] Stream-stream inner join in Append Mode ## What changes were proposed in this pull request? Architecture This PR implements stream-stream inner join using a two-way symmetric hash join. At a high level, we want to do the following. 1. For each stream, we maintain the past

spark git commit: [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo

2017-09-21 Thread vanzin
Repository: spark Updated Branches: refs/heads/branch-2.1 56865a1e9 -> 1a4b6eea8 [SPARK-21928][CORE] Set classloader on SerializerManager's private kryo ## What changes were proposed in this pull request? We have to make sure that SerializerManager's private instance of kryo also uses the rig

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f32a84250 -> fedf6961b [SPARK-22094][SS] processAllAvailable should check the query state ## What changes were proposed in this pull request? `processAllAvailable` should also check the query state and if the query is stopped, it should r

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 765fd92e7 -> 090b987e6 [SPARK-22094][SS] processAllAvailable should check the query state `processAllAvailable` should also check the query state and if the query is stopped, it should return. The new unit test. Author: Shixiong Zhu

spark git commit: [SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator

2017-09-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master fedf6961b -> 5ac96854c [SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator ## What changes were proposed in this pull request? Added Python interface for ClusteringEvaluator ## How was this patch tested? Manual test,

spark git commit: [SPARK-21998][SQL] SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning

2017-09-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5ac96854c -> 5960686e7 [SPARK-21998][SQL] SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning ## What changes were proposed in this pull request? Right now the calculation of SortMergeJoinExec's outpu