spark git commit: [SPARK-11493] remove bitset from BytesToBytesMap

2015-11-04 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 701fb5052 -> 1b6a5d4af [SPARK-11493] remove bitset from BytesToBytesMap Since we have 4 bytes as number of records in the beginning of a page, the address can not be zero, so we do not need the bitset. For performance concerns, the

[2/5] spark git commit: [SPARK-11505][SQL] Break aggregate functions into multiple files

2015-11-04 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/d19f4fda/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala -- diff --git

[3/5] spark git commit: [SPARK-11505][SQL] Break aggregate functions into multiple files

2015-11-04 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/d19f4fda/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Last.scala -- diff --git

[1/5] spark git commit: [SPARK-11505][SQL] Break aggregate functions into multiple files

2015-11-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master abf5e4285 -> d19f4fda6 http://git-wip-us.apache.org/repos/asf/spark/blob/d19f4fda/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/utils.scala

spark git commit: [SPARK-11504][SQL] API audit for distributeBy and localSort

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master de289bf27 -> abf5e4285 [SPARK-11504][SQL] API audit for distributeBy and localSort 1. Renamed localSort -> sortWithinPartitions to avoid ambiguity in "local" 2. distributeBy -> repartition to match the existing repartition. Author:

spark git commit: [SPARK-10949] Update Snappy version to 1.1.2

2015-11-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master d19f4fda6 -> 701fb5052 [SPARK-10949] Update Snappy version to 1.1.2 This is an updated version of #8995 by a-roberts. Original description follows: Snappy now supports concatenation of serialized streams, this patch contains a version

spark git commit: [SPARK-10028][MLLIB][PYTHON] Add Python API for PrefixSpan

2015-11-04 Thread meng
Repository: spark Updated Branches: refs/heads/master 1b6a5d4af -> 411ff6afb [SPARK-10028][MLLIB][PYTHON] Add Python API for PrefixSpan Author: Yu ISHIKAWA Closes #9469 from yu-iskw/SPARK-10028. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-11307] Reduce memory consumption of OutputCommitCoordinator

2015-11-04 Thread davies
Repository: spark Updated Branches: refs/heads/master a752ddad7 -> d0b563396 [SPARK-11307] Reduce memory consumption of OutputCommitCoordinator OutputCommitCoordinator uses a map in a place where an array would suffice, increasing its memory consumption for result stages with millions of

spark git commit: [SPARK-11491] Update build to use Scala 2.10.5

2015-11-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master b6e0a5ae6 -> ce5e6a284 [SPARK-11491] Update build to use Scala 2.10.5 Spark should build against Scala 2.10.5, since that includes a fix for Scaladoc that will fix doc snapshot publishing: https://issues.scala-lang.org/browse/SI-8479

spark git commit: [SPARK-11398] [SQL] unnecessary def dialectClassName in HiveContext, and misleading dialect conf at the start of spark-sql

2015-11-04 Thread davies
Repository: spark Updated Branches: refs/heads/master ce5e6a284 -> a752ddad7 [SPARK-11398] [SQL] unnecessary def dialectClassName in HiveContext, and misleading dialect conf at the start of spark-sql 1. def dialectClassName in HiveContext is unnecessary. In HiveContext, if conf.dialect ==

spark git commit: [SPARK-11510][SQL] Remove SQL aggregation tests for higher order statistics

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 411ff6afb -> b6e0a5ae6 [SPARK-11510][SQL] Remove SQL aggregation tests for higher order statistics We have some aggregate function tests in both DataFrameAggregateSuite and SQLQuerySuite. The two have almost the same coverage and we

spark git commit: [SPARK-11425] [SPARK-11486] Improve hybrid aggregation

2015-11-04 Thread davies
Repository: spark Updated Branches: refs/heads/master d0b563396 -> 81498dd5c [SPARK-11425] [SPARK-11486] Improve hybrid aggregation After aggregation, the dataset could be smaller than inputs, so it's better to do hash based aggregation for all inputs, then using sort based aggregation to

spark git commit: [SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md using include_example

2015-11-04 Thread meng
Repository: spark Updated Branches: refs/heads/master e328b69c3 -> 820064e61 [SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md using include_example Author: Pravin Gadakh Author: Pravin Gadakh Closes #9340 from

spark git commit: [SPARK-9492][ML][R] LogisticRegression in R should provide model statistics

2015-11-04 Thread meng
Repository: spark Updated Branches: refs/heads/master c09e51398 -> e328b69c3 [SPARK-9492][ML][R] LogisticRegression in R should provide model statistics Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients.

spark git commit: [SPARK-11490][SQL] variance should alias var_samp instead of var_pop.

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e0fc9c7e5 -> 3bd6f5d2a [SPARK-11490][SQL] variance should alias var_samp instead of var_pop. stddev is an alias for stddev_samp. variance should be consistent with stddev. Also took the chance to remove internal Stddev and Variance, and

spark git commit: [SPARK-11235][NETWORK] Add ability to stream data using network lib.

2015-11-04 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 8790ee6d6 -> 27feafccb [SPARK-11235][NETWORK] Add ability to stream data using network lib. The current interface used to fetch shuffle data is not very efficient for large buffers; it requires the receiver to buffer the entirety of the

spark git commit: Closes #9464

2015-11-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3bd6f5d2a -> 987df4bfc Closes #9464 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/987df4bf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/987df4bf Diff:

spark git commit: [SPARK-10304][SQL] Following up checking valid dir structure for partition discovery

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 987df4bfc -> de289bf27 [SPARK-10304][SQL] Following up checking valid dir structure for partition discovery This patch follows up #8840. Author: Liang-Chi Hsieh Closes #9459 from

spark git commit: [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen)

2015-11-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 2692bdb7d -> 8aff36e91 [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look

spark git commit: [SPARK-11442] Reduce numSlices for local metrics test of SparkListenerSuite

2015-11-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 8aff36e91 -> c09e51398 [SPARK-11442] Reduce numSlices for local metrics test of SparkListenerSuite In the thread, http://search-hadoop.com/m/q3RTtcQiFSlTxeP/test+failed+due+to+OOME=test+failed+due+to+OOME, it was discussed that memory