git commit: [SPARK-1817] RDD.zip() should verify partition sizes for each partition

2014-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4ca062566 -> c402a4a68 [SPARK-1817] RDD.zip() should verify partition sizes for each partition RDD.zip() will throw an exception if it finds partition sizes are not the same. Author: Kan Zhang Closes #944 from kanzhang/SPARK-1817 and squ

git commit: SPARK-1806 (addendum) Use non-deprecated methods in Mesos 0.18

2014-06-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/master ab7c62d57 -> 4ca062566 SPARK-1806 (addendum) Use non-deprecated methods in Mesos 0.18 The update to Mesos 0.18 caused some deprecation warnings in the build. The change to the non-deprecated version is straightforward as it emulates what t

git commit: Update spark-ec2 scripts for 1.0.0 on master

2014-06-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 5284ca78d -> ab7c62d57 Update spark-ec2 scripts for 1.0.0 on master The change was previously committed only to branch-1.0 as part of https://github.com/apache/spark/commit/a34e6fda1d6fb8e769c21db70845f1a6dde968d8 Author: Aaron Davidson

git commit: Enable repartitioning of graph over different number of partitions

2014-06-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master e8d93ee52 -> 5284ca78d Enable repartitioning of graph over different number of partitions It is currently very difficult to repartition a graph over a different number of partitions. This PR adds an additional `partitionBy` function that

git commit: use env default python in merge_spark_pr.py

2014-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1faef149f -> e8d93ee52 use env default python in merge_spark_pr.py A minor change to use env default python instead of fixed `/usr/bin/python`. Author: Xiangrui Meng Closes #965 from mengxr/merge-pr-python and squashes the following comm

git commit: SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog.

2014-06-03 Thread meng
Repository: spark Updated Branches: refs/heads/master 21e40ed88 -> 1faef149f SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog. I also corrected some errors made in the previous HLL count approximate API, including relativeSD wasn't really a measure for err

git commit: [SPARK-1161] Add saveAsPickleFile and SparkContext.pickleFile in Python

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/master f4dd665c8 -> 21e40ed88 [SPARK-1161] Add saveAsPickleFile and SparkContext.pickleFile in Python Author: Kan Zhang Closes #755 from kanzhang/SPARK-1161 and squashes the following commits: 24ed8a2 [Kan Zhang] [SPARK-1161] Fixing doc tests 4

git commit: Fixed a typo

2014-06-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/master b1feb6020 -> f4dd665c8 Fixed a typo in RowMatrix.scala Author: DB Tsai Closes #959 from dbtsai/dbtsai-typo and squashes the following commits: fab0e0e [DB Tsai] Fixed typo Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

git commit: [SPARK-1991] Support custom storage levels for vertices and edges

2014-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 894ecde04 -> b1feb6020 [SPARK-1991] Support custom storage levels for vertices and edges This PR adds support for specifying custom storage levels for the vertices and edges of a graph. This enables GraphX to handle graphs larger than memo

git commit: Synthetic GraphX Benchmark

2014-06-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master aa41a522d -> 894ecde04 Synthetic GraphX Benchmark This PR accomplishes two things: 1. It introduces a Synthetic Benchmark application that generates an arbitrarily large log-normal graph and executes either PageRank or connected componen

git commit: fix java.lang.ClassCastException

2014-06-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 8edc9d033 -> aa41a522d fix java.lang.ClassCastException get Exception when run:bin/run-example org.apache.spark.examples.sql.RDDRelation Exception's detail is: Exception in thread "main" java.lang.ClassCastException: java.lang.Long canno

git commit: fix java.lang.ClassCastException

2014-06-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.0 350cfd311 -> d96794132 fix java.lang.ClassCastException get Exception when run:bin/run-example org.apache.spark.examples.sql.RDDRelation Exception's detail is: Exception in thread "main" java.lang.ClassCastException: java.lang.Long c

git commit: [SPARK-1468] Modify the partition function used by partitionBy.

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/master b1f285359 -> 8edc9d033 [SPARK-1468] Modify the partition function used by partitionBy. Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same val

git commit: [SPARK-1468] Modify the partition function used by partitionBy.

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/branch-1.0 316d026f4 -> 350cfd311 [SPARK-1468] Modify the partition function used by partitionBy. Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same

git commit: [SPARK-1468] Modify the partition function used by partitionBy.

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/branch-0.9 e03af416c -> 41e7853fc [SPARK-1468] Modify the partition function used by partitionBy. Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same

git commit: Add support for Pivotal HD in the Maven build: SPARK-1992

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/branch-1.0 af0a89c46 -> 316d026f4 Add support for Pivotal HD in the Maven build: SPARK-1992 Allow Spark to build against particular Pivotal HD distributions. For example to build Spark against Pivotal HD 2.0.1 one can run: ``` mvn -Pyarn -Phadoop

git commit: Add support for Pivotal HD in the Maven build: SPARK-1992

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/master 45e9bc85d -> b1f285359 Add support for Pivotal HD in the Maven build: SPARK-1992 Allow Spark to build against particular Pivotal HD distributions. For example to build Spark against Pivotal HD 2.0.1 one can run: ``` mvn -Pyarn -Phadoop-2.2

git commit: [SPARK-1912] fix compress memory issue during reduce

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/master 6c044ed10 -> 45e9bc85d [SPARK-1912] fix compress memory issue during reduce When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task nee

git commit: SPARK-2001 : Remove docs/spark-debugger.md from master

2014-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 6db0d5cfe -> af0a89c46 SPARK-2001 : Remove docs/spark-debugger.md from master Per discussion in dev list: " Seemed like the spark-debugger.md is no longer accurate (see http://spark.apache.org/docs/latest/spark-debugger.html) and since

git commit: SPARK-2001 : Remove docs/spark-debugger.md from master

2014-06-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7782a304a -> 6c044ed10 SPARK-2001 : Remove docs/spark-debugger.md from master Per discussion in dev list: " Seemed like the spark-debugger.md is no longer accurate (see http://spark.apache.org/docs/latest/spark-debugger.html) and since it w

git commit: [SPARK-1942] Stop clearing spark.driver.port in unit tests

2014-06-03 Thread matei
Repository: spark Updated Branches: refs/heads/master 862283e9c -> 7782a304a [SPARK-1942] Stop clearing spark.driver.port in unit tests stop resetting spark.driver.port in unit tests (scala, java and python). Author: Syed Hashmi Author: CodingCat Closes #943 from syedhashmi/master and squa