git commit: [SPARK-2842][MLlib]Word2Vec documentation

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/master 3c8fa5059 - eef779b8d [SPARK-2842][MLlib]Word2Vec documentation mengxr Documentation for Word2Vec Author: Liquan Pei liquan...@gmail.com Closes #2003 from Ishiihara/Word2Vec-doc and squashes the following commits: 4ff11d4 [Liquan Pei]

git commit: [SPARK-2842][MLlib]Word2Vec documentation

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 708cde99a - 518258f1b [SPARK-2842][MLlib]Word2Vec documentation mengxr Documentation for Word2Vec Author: Liquan Pei liquan...@gmail.com Closes #2003 from Ishiihara/Word2Vec-doc and squashes the following commits: 4ff11d4 [Liquan

git commit: [SPARK-2862] histogram method fails on some choices of bucketCount

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/master c0cbbdeaf - f45efbb8a [SPARK-2862] histogram method fails on some choices of bucketCount Author: Chandan Kumar chandan.ku...@imaginea.com Closes #1787 from nrchandan/spark-2862 and squashes the following commits: a76bbf6 [Chandan Kumar]

git commit: SPARK-3096: Include parquet hive serde by default in build

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 12f16ba3f - ec0b91edd SPARK-3096: Include parquet hive serde by default in build A small change - we should just add this dependency. It doesn't have any recursive deps and it's needed for reading have parquet tables. Author: Patrick

git commit: [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7ae28d124 - 6a13dca12 [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because it's

git commit: [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 ec0b91edd - 55e9dd637 [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because

git commit: SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 4da76fc81 - 496f62d9a SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool This definitely needs review as I am not familiar with this part of Spark. I tested this locally and it did seem to work. Author: Patrick Wendell

git commit: SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4bf3de710 - 6bca8898a SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool This definitely needs review as I am not familiar with this part of Spark. I tested this locally and it did seem to work. Author: Patrick Wendell

git commit: [SPARK-3091] [SQL] Add support for caching metadata on Parquet files

2014-08-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6bca8898a - 9eb74c7d2 [SPARK-3091] [SQL] Add support for caching metadata on Parquet files For larger Parquet files, reading the file footers (which is done in parallel on up to 5 threads) and HDFS block locations (which is serial) can

svn commit: r1618711 [2/2] - in /spark: ./ site/ site/news/ site/releases/

2014-08-18 Thread pwendell
Modified: spark/site/releases/spark-release-0-8-0.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-8-0.html?rev=1618711r1=1618710r2=1618711view=diff == ---

git commit: Removed .travis.yml file since we are not using Travis.

2014-08-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 66ade00f9 - 3a5962f0f Removed .travis.yml file since we are not using Travis. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3a5962f0 Tree:

git commit: [SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8

2014-08-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3a5962f0f - d1d0ee41c [SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8 bugfix: It will raise an exception when it try to encode non-ASCII strings into unicode. It should only encode unicode as utf-8. Author: Davies Liu

git commit: [SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8

2014-08-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 cc4015d2f - e08333463 [SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8 bugfix: It will raise an exception when it try to encode non-ASCII strings into unicode. It should only encode unicode as utf-8. Author: Davies Liu

git commit: [SPARK-2718] [yarn] Handle quotes and other characters in user args.

2014-08-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.1 e08333463 - 25cabd7ee [SPARK-2718] [yarn] Handle quotes and other characters in user args. Due to the way Yarn runs things through bash, normal quoting doesn't work as expected. This change applies the necessary voodoo to the user args

git commit: [mllib] DecisionTree: treeAggregate + Python example bug fix

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 25cabd7ee - 98778fffd [mllib] DecisionTree: treeAggregate + Python example bug fix Small DecisionTree updates: * Changed main DecisionTree aggregate to treeAggregate. * Fixed bug in python example decision_tree_runner.py with missing

git commit: [mllib] DecisionTree: treeAggregate + Python example bug fix

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/master 6201b2764 - 115eeb30d [mllib] DecisionTree: treeAggregate + Python example bug fix Small DecisionTree updates: * Changed main DecisionTree aggregate to treeAggregate. * Fixed bug in python example decision_tree_runner.py with missing

git commit: [SPARK-2850] [SPARK-2626] [mllib] MLlib stats examples + small fixes

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 98778fffd - e3f89e971 [SPARK-2850] [SPARK-2626] [mllib] MLlib stats examples + small fixes Added examples for statistical summarization: * Scala: StatisticalSummary.scala ** Tests: correlation, MultivariateOnlineSummarizer * python:

git commit: [SPARK-2850] [SPARK-2626] [mllib] MLlib stats examples + small fixes

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/master 115eeb30d - c8b16ca0d [SPARK-2850] [SPARK-2626] [mllib] MLlib stats examples + small fixes Added examples for statistical summarization: * Scala: StatisticalSummary.scala ** Tests: correlation, MultivariateOnlineSummarizer * python:

git commit: [SPARK-3108][MLLIB] add predictOnValues to StreamingLR and fix predictOn

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/master c8b16ca0d - 217b5e915 [SPARK-3108][MLLIB] add predictOnValues to StreamingLR and fix predictOn It is useful in streaming to allow users to carry extra data with the prediction, for monitoring the prediction error for example. freeman-lab

git commit: [SPARK-3108][MLLIB] add predictOnValues to StreamingLR and fix predictOn

2014-08-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 e3f89e971 - 7d069bf0c [SPARK-3108][MLLIB] add predictOnValues to StreamingLR and fix predictOn It is useful in streaming to allow users to carry extra data with the prediction, for monitoring the prediction error for example.

git commit: [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL.

2014-08-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 7d069bf0c - 3a03259a0 [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL. This fixes SPARK-3114, an issue where we inadvertently broke Python UDFs in Spark SQL. This PR modifiers the test runner script to always run the PySpark SQL

git commit: [SPARK-3116] Remove the excessive lockings in TorrentBroadcast

2014-08-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1f1819b20 - 82577339d [SPARK-3116] Remove the excessive lockings in TorrentBroadcast Author: Reynold Xin r...@apache.org Closes #2028 from rxin/torrentBroadcast and squashes the following commits: 92c62a5 [Reynold Xin] Revert the