spark git commit: [SPARK-5569][STREAMING] fix ObjectInputStreamWithLoader for supporting load array classes.

2015-10-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8f888eea1 -> 17f499920 [SPARK-5569][STREAMING] fix ObjectInputStreamWithLoader for supporting load array classes. When use Kafka DirectStream API to create checkpoint and restore saved checkpoint when restart, ClassNotFound exception

spark git commit: [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API

2015-10-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master feb8d6a44 -> 8f888eea1 [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API jerryshao tdas I know this is kind of minor, and I know you all are busy, but this brings this class in

spark git commit: [SPARK-11297] Add new code tags

2015-10-27 Thread meng
Repository: spark Updated Branches: refs/heads/master 8b292b19c -> d77d198fc [SPARK-11297] Add new code tags mengxr https://issues.apache.org/jira/browse/SPARK-11297 Add new code tags to hold the same look and feel with previous documents. Author: Xusen Yin Closes

spark git commit: [SPARK-11276][CORE] SizeEstimator prevents class unloading

2015-10-27 Thread srowen
Repository: spark Updated Branches: refs/heads/master d77d198fc -> feb8d6a44 [SPARK-11276][CORE] SizeEstimator prevents class unloading The SizeEstimator keeps a cache of ClassInfos but this cache uses Class objects as keys. Which results in strong references to the Class objects. If these

spark git commit: [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API

2015-10-27 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 8a6e63c78 -> abb0ca7a9 [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API jerryshao tdas I know this is kind of minor, and I know you all are busy, but this brings this class in

spark git commit: [SPARK-11277][SQL] sort_array throws exception scala.MatchError

2015-10-27 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 17f499920 -> 958a0ec8f [SPARK-11277][SQL] sort_array throws exception scala.MatchError I'm new to spark. I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that

spark git commit: [SPARK-11303][SQL] filter should not be pushed down into sample

2015-10-27 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 958a0ec8f -> 360ed832f [SPARK-11303][SQL] filter should not be pushed down into sample When sampling and then filtering DataFrame, the SQL Optimizer will push down filter into sample and produce wrong result. This is due to the sampler is

spark git commit: [SPARK-6488][MLLIB][PYTHON] Support addition/multiplication in PySpark's BlockMatrix

2015-10-27 Thread meng
Repository: spark Updated Branches: refs/heads/master 9fc16a82a -> 3bdbbc6c9 [SPARK-6488][MLLIB][PYTHON] Support addition/multiplication in PySpark's BlockMatrix This PR adds addition and multiplication to PySpark's `BlockMatrix` class via `add` and `multiply` functions. Author: Mike

spark git commit: [SPARK-11306] Fix hang when JVM exits.

2015-10-27 Thread kayousterhout
Repository: spark Updated Branches: refs/heads/master 360ed832f -> 9fc16a82a [SPARK-11306] Fix hang when JVM exits. This commit fixes a bug where, in Standalone mode, if a task fails and crashes the JVM, the failure is considered a "normal failure" (meaning it's considered unrelated to the

[2/2] spark git commit: [SPARK-11347] [SQL] Support for joinWith in Datasets

2015-10-27 Thread yhuai
[SPARK-11347] [SQL] Support for joinWith in Datasets This PR adds a new operation `joinWith` to a `Dataset`, which returns a `Tuple` for each pair where a given `condition` evaluates to true. ```scala case class ClassData(a: String, b: Int) val ds1 = Seq(ClassData("a", 1), ClassData("b",

[1/2] spark git commit: [SPARK-11347] [SQL] Support for joinWith in Datasets

2015-10-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 3bdbbc6c9 -> 5a5f65905 http://git-wip-us.apache.org/repos/asf/spark/blob/5a5f6590/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala -- diff --git

spark git commit: [SPARK-10024][PYSPARK] Python API RF and GBT related params clear up

2015-10-27 Thread meng
Repository: spark Updated Branches: refs/heads/master 5a5f65905 -> 9dba5fb2b [SPARK-10024][PYSPARK] Python API RF and GBT related params clear up implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API in pyspark/ml/{classification, regression}.py

spark git commit: [SPARK-11324][STREAMING] Flag for closing Write Ahead Logs after a write

2015-10-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9dba5fb2b -> 4f030b9e8 [SPARK-11324][STREAMING] Flag for closing Write Ahead Logs after a write Currently the Write Ahead Log in Spark Streaming flushes data as writes need to be made. S3 does not support flushing of data, data is written

spark git commit: [SPARK-11178] Improving naming around task failures.

2015-10-27 Thread kayousterhout
Repository: spark Updated Branches: refs/heads/master 9fbd75ab5 -> b960a8905 [SPARK-11178] Improving naming around task failures. Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running

spark git commit: [SPARK-10484] [SQL] Optimize the cartesian join with broadcast join for some cases

2015-10-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b960a8905 -> d9c603989 [SPARK-10484] [SQL] Optimize the cartesian join with broadcast join for some cases In some cases, we can broadcast the smaller relation in cartesian join, which improve the performance significantly. Author: Cheng

svn commit: r1710944 - in /spark: releases/_posts/2015-09-09-spark-release-1-5-0.md site/releases/spark-release-1-5-0.html

2015-10-27 Thread meng
Author: meng Date: Wed Oct 28 05:01:01 2015 New Revision: 1710944 URL: http://svn.apache.org/viewvc?rev=1710944=rev Log: add Bert Greevenbosch to 1.5.0 contributors Modified: spark/releases/_posts/2015-09-09-spark-release-1-5-0.md spark/site/releases/spark-release-1-5-0.html Modified: