git commit: [SPARK-3141] [PySpark] fix sortByKey() with take()

2014-08-19 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.1 f8c908ebf -> 5b22ebf68 [SPARK-3141] [PySpark] fix sortByKey() with take() Fix sortByKey() with take() The function `f` used in mapPartitions should always return an iterator. Author: Davies Liu Closes #2045 from davies/fix_sortbykey

git commit: [SPARK-3141] [PySpark] fix sortByKey() with take()

2014-08-19 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 8a74e4b2a -> 0a7ef6339 [SPARK-3141] [PySpark] fix sortByKey() with take() Fix sortByKey() with take() The function `f` used in mapPartitions should always return an iterator. Author: Davies Liu Closes #2045 from davies/fix_sortbykey and

git commit: [DOCS] Fixed wrong links

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 5d1a87866 -> f8c908ebf [DOCS] Fixed wrong links Author: Ken Takagiwa Closes #2042 from giwa/patch-1 and squashes the following commits: 216fe0e [Ken Takagiwa] Fixed wrong links (cherry picked from commit 8a74e4b2a8c7dab154b40653948

git commit: [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs

2014-08-19 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.1 a5bc9c601 -> 5d1a87866 [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs This PR fixes two bugs related to `spark.local.dirs` and `SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid directory (S

git commit: [DOCS] Fixed wrong links

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master ebcb94f70 -> 8a74e4b2a [DOCS] Fixed wrong links Author: Ken Takagiwa Closes #2042 from giwa/patch-1 and squashes the following commits: 216fe0e [Ken Takagiwa] Fixed wrong links Project: http://git-wip-us.apache.org/repos/asf/spark/rep

git commit: [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs

2014-08-19 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 0a984aa15 -> ebcb94f70 [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs This PR fixes two bugs related to `spark.local.dirs` and `SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid directory (SPARK

git commit: [SPARK-3142][MLLIB] output shuffle data directly in Word2Vec

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 08c9973da -> a5bc9c601 [SPARK-3142][MLLIB] output shuffle data directly in Word2Vec Sorry I didn't realize this in #2043. Ishiihara Author: Xiangrui Meng Closes #2049 from mengxr/more-w2v and squashes the following commits: 050b1c5

git commit: [SPARK-3142][MLLIB] output shuffle data directly in Word2Vec

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 8adfbc2b6 -> 0a984aa15 [SPARK-3142][MLLIB] output shuffle data directly in Word2Vec Sorry I didn't realize this in #2043. Ishiihara Author: Xiangrui Meng Closes #2049 from mengxr/more-w2v and squashes the following commits: 050b1c5 [Xia

git commit: [SPARK-3119] Re-implementation of TorrentBroadcast.

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 d5db95bae -> 08c9973da [SPARK-3119] Re-implementation of TorrentBroadcast. This is a re-implementation of TorrentBroadcast, with the following changes: 1. Removes most of the mutable, transient state from TorrentBroadcast (e.g. totalB

git commit: [SPARK-3119] Re-implementation of TorrentBroadcast.

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master fce5c0fb6 -> 8adfbc2b6 [SPARK-3119] Re-implementation of TorrentBroadcast. This is a re-implementation of TorrentBroadcast, with the following changes: 1. Removes most of the mutable, transient state from TorrentBroadcast (e.g. totalBytes

git commit: [HOTFIX][Streaming][MLlib] use temp folder for checkpoint

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 068b6fe6a -> fce5c0fb6 [HOTFIX][Streaming][MLlib] use temp folder for checkpoint or Jenkins will complain about no Apache header in checkpoint files. tdas rxin Author: Xiangrui Meng Closes #2046 from mengxr/tmp-checkpoint and squashes th

git commit: [HOTFIX][Streaming][MLlib] use temp folder for checkpoint

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 148e45b91 -> d5db95bae [HOTFIX][Streaming][MLlib] use temp folder for checkpoint or Jenkins will complain about no Apache header in checkpoint files. tdas rxin Author: Xiangrui Meng Closes #2046 from mengxr/tmp-checkpoint and squashe

git commit: [SPARK-3130][MLLIB] detect negative values in naive Bayes

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 607735c16 -> 148e45b91 [SPARK-3130][MLLIB] detect negative values in naive Bayes because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng Closes #2038 from mengxr/nb-neg and squashes the following commits

git commit: [SPARK-3130][MLLIB] detect negative values in naive Bayes

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 0e3ab94d4 -> 068b6fe6a [SPARK-3130][MLLIB] detect negative values in naive Bayes because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng Closes #2038 from mengxr/nb-neg and squashes the following commits: 5

git commit: [SQL] add note of use synchronizedMap in SQLConf

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master c7252b009 -> 0e3ab94d4 [SQL] add note of use synchronizedMap in SQLConf Refer to: http://stackoverflow.com/questions/510632/whats-the-difference-between-concurrenthashmap-and-collections-synchronizedmap Collections.synchronizedMap(map) crea

git commit: [SQL] add note of use synchronizedMap in SQLConf

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 d75464de5 -> 607735c16 [SQL] add note of use synchronizedMap in SQLConf Refer to: http://stackoverflow.com/questions/510632/whats-the-difference-between-concurrenthashmap-and-collections-synchronizedmap Collections.synchronizedMap(map)

git commit: [SPARK-3112][MLLIB] Add documentation and example for StreamingLR

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 023ed7c0f -> d75464de5 [SPARK-3112][MLLIB] Add documentation and example for StreamingLR Added a documentation section on StreamingLR to the ``MLlib - Linear Methods``, including a worked example. mengxr tdas Author: freeman Closes

git commit: [SPARK-3112][MLLIB] Add documentation and example for StreamingLR

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 1870dbaa5 -> c7252b009 [SPARK-3112][MLLIB] Add documentation and example for StreamingLR Added a documentation section on StreamingLR to the ``MLlib - Linear Methods``, including a worked example. mengxr tdas Author: freeman Closes #20

git commit: [MLLIB] minor update to word2vec

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 66b4c81db -> 023ed7c0f [MLLIB] minor update to word2vec very minor update Ishiihara Author: Xiangrui Meng Closes #2043 from mengxr/minor-w2v and squashes the following commits: be649fd [Xiangrui Meng] remove map because we only need

git commit: [MLLIB] minor update to word2vec

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 8b9dc9910 -> 1870dbaa5 [MLLIB] minor update to word2vec very minor update Ishiihara Author: Xiangrui Meng Closes #2043 from mengxr/minor-w2v and squashes the following commits: be649fd [Xiangrui Meng] remove map because we only need app

[2/2] git commit: [SPARK-2468] Netty based block server / client module

2014-08-19 Thread rxin
[SPARK-2468] Netty based block server / client module Previous pull request (#1907) was reverted. This brings it back. Still looking into the hang. Author: Reynold Xin Closes #1971 from rxin/netty1 and squashes the following commits: b0be96f [Reynold Xin] Added test to make sure outstandingRe

[2/2] git commit: [SPARK-2468] Netty based block server / client module

2014-08-19 Thread rxin
[SPARK-2468] Netty based block server / client module Previous pull request (#1907) was reverted. This brings it back. Still looking into the hang. Author: Reynold Xin Closes #1971 from rxin/netty1 and squashes the following commits: b0be96f [Reynold Xin] Added test to make sure outstandingRe

[1/2] [SPARK-2468] Netty based block server / client module

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 d371c71cb -> 66b4c81db http://git-wip-us.apache.org/repos/asf/spark/blob/66b4c81d/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala -- diff --git

[1/2] [SPARK-2468] Netty based block server / client module

2014-08-19 Thread rxin
Repository: spark Updated Branches: refs/heads/master 825d4fe47 -> 8b9dc9910 http://git-wip-us.apache.org/repos/asf/spark/blob/8b9dc991/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala -- diff --git a/

git commit: [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 3540d4b38 -> d371c71cb [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs Though we don't use default argument for methods in RandomRDDs, it is still not easy for Java users to use because the output type is either `RDD[Dou

git commit: [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs

2014-08-19 Thread meng
Repository: spark Updated Branches: refs/heads/master d7e80c259 -> 825d4fe47 [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs Though we don't use default argument for methods in RandomRDDs, it is still not easy for Java users to use because the output type is either `RDD[Double]

git commit: [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes.

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 f6b4ab83c -> 3540d4b38 [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes. If two RDDs have different batch size in serializers, then it will try to re-serialize the one with smaller batch size, then call

git commit: [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes.

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 76eaeb452 -> d7e80c259 [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes. If two RDDs have different batch size in serializers, then it will try to re-serialize the one with smaller batch size, then call RDD

git commit: Move a bracket in validateSettings of SparkConf

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 94053a7b7 -> 76eaeb452 Move a bracket in validateSettings of SparkConf Move a bracket in validateSettings of SparkConf Author: hzw19900416 Closes #2012 from hzw19900416/codereading and squashes the following commits: e717fb6 [hzw1990041

git commit: Move a bracket in validateSettings of SparkConf

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 c3952b092 -> f6b4ab83c Move a bracket in validateSettings of SparkConf Move a bracket in validateSettings of SparkConf Author: hzw19900416 Closes #2012 from hzw19900416/codereading and squashes the following commits: e717fb6 [hzw199

git commit: SPARK-2333 - spark_ec2 script should allow option for existing security group

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 04a320862 -> c3952b092 SPARK-2333 - spark_ec2 script should allow option for existing security group - Uses the name tag to identify machines in a cluster. - Allows overriding the security group name so it doesn't need to coinci

git commit: SPARK-2333 - spark_ec2 script should allow option for existing security group

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 31f0b071e -> 94053a7b7 SPARK-2333 - spark_ec2 script should allow option for existing security group - Uses the name tag to identify machines in a cluster. - Allows overriding the security group name so it doesn't need to coincide

git commit: [SPARK-3128][MLLIB] Use streaming test suite for StreamingLR

2014-08-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 5d895ad56 -> 04a320862 [SPARK-3128][MLLIB] Use streaming test suite for StreamingLR Refactored tests for streaming linear regression to use existing streaming test utilities. Summary of changes: - Made ``mllib`` depend on tests from `

git commit: [SPARK-3128][MLLIB] Use streaming test suite for StreamingLR

2014-08-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master cbfc26ba4 -> 31f0b071e [SPARK-3128][MLLIB] Use streaming test suite for StreamingLR Refactored tests for streaming linear regression to use existing streaming test utilities. Summary of changes: - Made ``mllib`` depend on tests from ``str

git commit: [SPARK-3089] Fix meaningless error message in ConnectionManager

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 1418893da -> 5d895ad56 [SPARK-3089] Fix meaningless error message in ConnectionManager Author: Kousuke Saruta Closes #2000 from sarutak/SPARK-3089 and squashes the following commits: 02dfdea [Kousuke Saruta] Merge branch 'master' of

git commit: [SPARK-3089] Fix meaningless error message in ConnectionManager

2014-08-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 7eb9cbc27 -> cbfc26ba4 [SPARK-3089] Fix meaningless error message in ConnectionManager Author: Kousuke Saruta Closes #2000 from sarutak/SPARK-3089 and squashes the following commits: 02dfdea [Kousuke Saruta] Merge branch 'master' of git:

git commit: [SPARK-3072] YARN - Exit when reach max number failed executors

2014-08-19 Thread tgraves
Repository: spark Updated Branches: refs/heads/branch-1.1 f3b0f34b4 -> 1418893da [SPARK-3072] YARN - Exit when reach max number failed executors In some cases on hadoop 2.x the spark application master doesn't properly exit and hangs around for 10 minutes after its really done. We should mak

git commit: [SPARK-3072] YARN - Exit when reach max number failed executors

2014-08-19 Thread tgraves
Repository: spark Updated Branches: refs/heads/master cd0720ca7 -> 7eb9cbc27 [SPARK-3072] YARN - Exit when reach max number failed executors In some cases on hadoop 2.x the spark application master doesn't properly exit and hangs around for 10 minutes after its really done. We should make su