spark git commit: [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8136810df -> ab9128fb7 [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime. There are tw

spark git commit: [SPARK-5990] [MLLIB] Model import/export for IsotonicRegression

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master ab9128fb7 -> 1f2f723b0 [SPARK-5990] [MLLIB] Model import/export for IsotonicRegression Model import/export for IsotonicRegression Author: Yanbo Liang Closes #5270 from yanboliang/spark-5990 and squashes the following commits: 872028d [Y

spark git commit: [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1f2f723b0 -> 5fea3e5c3 [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0 millisec

spark git commit: [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 8549ff4f0 -> 948f2f635 [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0 mill

Git Push Summary

2015-04-21 Thread pwendell
Repository: spark Updated Tags: refs/tags/v1.2.2-rc1 [created] 7531b50e4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized CoGroupedRDD

2015-04-21 Thread kayousterhout
Repository: spark Updated Branches: refs/heads/master 5fea3e5c3 -> c035c0f2d [SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized CoGroupedRDD CoGroupPartition, part of CoGroupedRDD, includes references to each RDD that the CoGroupedRDD narrowly depends on, and a reference to t

svn commit: r8670 - in /dev/spark/spark-1.2.2-rc1: ./ spark-1.2.2-bin-hadoop2.4.tgz spark-1.2.2-bin-hadoop2.4.tgz.asc spark-1.2.2-bin-hadoop2.4.tgz.md5 spark-1.2.2-bin-hadoop2.4.tgz.sha

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:49:30 2015 New Revision: 8670 Log: Adding missing Hadoop 2.4 binary for Spark 1.2.2 Added: dev/spark/spark-1.2.2-rc1/ dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz (with props) dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.asc

svn commit: r8671 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:50:59 2015 New Revision: 8671 Log: Spark 1.2.2 Hadoop 2.4 TGZ Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz - copied unchanged from r8670, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz Removed: dev/spark/spark-1.2.2-r

svn commit: r8672 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.asc /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.asc

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:51:16 2015 New Revision: 8672 Log: Spark 1.2.2 Hadoop 2.4 ASC Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.asc - copied unchanged from r8671, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.asc Removed: dev/spark/spark

svn commit: r8673 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.md5 /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.md5

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:51:32 2015 New Revision: 8673 Log: Spark 1.2.2 Hadoop 2.4 MD5 Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.md5 - copied unchanged from r8672, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.md5 Removed: dev/spark/spark

svn commit: r8674 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.sha /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.sha

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:52:08 2015 New Revision: 8674 Log: Spark 1.2.2 Hadoop 2.4 Sha Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.sha - copied unchanged from r8673, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.sha Removed: dev/spark/spark

spark git commit: SPARK-3276 Added a new configuration spark.streaming.minRememberDuration

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master c035c0f2d -> c25ca7c5a SPARK-3276 Added a new configuration spark.streaming.minRememberDuration SPARK-3276 Added a new configuration parameter ``spark.streaming.minRememberDuration``, with a default value of 1 minute. So that when a Spark

spark git commit: [SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master c25ca7c5a -> 45c47fa41 [SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix Since sparse matrices now support a isTransposed flag for row major data, DenseMatrices should do the same. Author: MechCoder Closes #5455 from Me

spark git commit: [SPARK-7011] Build(compilation) fails with scala 2.11 option, because a protected[sql] type is accessed in ml package.

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 45c47fa41 -> 04bf34e34 [SPARK-7011] Build(compilation) fails with scala 2.11 option, because a protected[sql] type is accessed in ml package. [This](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/featu

spark git commit: [SPARK-6994] Allow to fetch field values by name in sql.Row

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 04bf34e34 -> 2e8c6ca47 [SPARK-6994] Allow to fetch field values by name in sql.Row It looked weird that up to now there was no way in Spark's Scala API to access fields of `DataFrame/sql.Row` by name, only by their index. This tries to so

spark git commit: [SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 03fd92167 -> 6265cba00 [SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used https://issues.apache.org/jira/browse/SPARK-6969 Author: Yin Huai Closes #5583 from yhuai/refreshTableRefreshDataCache and squashes the followin

spark git commit: [SQL][minor] make it more clear that we only need to re-throw GetField exception for UnresolvedAttribute

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2e8c6ca47 -> 03fd92167 [SQL][minor] make it more clear that we only need to re-throw GetField exception for UnresolvedAttribute For `GetField` outside `UnresolvedAttribute`, we will throw exception in `Analyzer`. Author: Wenchen Fan Cl

spark git commit: [SPARK-6996][SQL] Support map types in java beans

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6265cba00 -> 2a24bf92e [SPARK-6996][SQL] Support map types in java beans liancheng mengxr this is similar to #5146. Author: Punya Biswal Closes #5578 from punya/feature/SPARK-6996 and squashes the following commits: d56c3e0 [Punya Biswa

spark git commit: [SPARK-5817] [SQL] Fix bug of udtf with column names

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2a24bf92e -> 7662ec23b [SPARK-5817] [SQL] Fix bug of udtf with column names It's a bug while do query like: ```sql select d from (select explode(array(1,1)) d from src limit 1) t ``` And it will throws exception like: ``` org.apache.spark.s

spark git commit: [SPARK-3386] Share and reuse SerializerInstances in shuffle paths

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7662ec23b -> f83c0f112 [SPARK-3386] Share and reuse SerializerInstances in shuffle paths This patch modifies several shuffle-related code paths to share and re-use SerializerInstances instead of creating new ones. Some serializers, such a

spark git commit: [minor] [build] Set java options when generating mima ignores.

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master f83c0f112 -> a70e849c7 [minor] [build] Set java options when generating mima ignores. The default java options make the call to GenerateMIMAIgnore take forever to run since it's gc'ing all the time. Improve things by setting the perm gen si

spark git commit: [SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls

2015-04-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master a70e849c7 -> 7fe6142cd [SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls 1. Use blas calls to find the dot product between two vectors. 2. Prevent re-computing the L2 norm of the given vector for each word in model. Auth

spark git commit: [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 7fe6142cd -> 686dd742e [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang Author: Xiangrui Meng Closes #5619 f

spark git commit: [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 948f2f635 -> fd61820d3 [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang Author: Xiangrui Meng Closes #56

spark git commit: [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 fd61820d3 -> 4508f0189 [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala add missing comma and space Author: Alain Closes #5621 from AiHe/tree-node-issue and squashes the following commits: 159a7bb [Alain] [

spark git commit: [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 686dd742e -> ae036d081 [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala add missing comma and space Author: Alain Closes #5621 from AiHe/tree-node-issue and squashes the following commits: 159a7bb [Alain] [Mino

spark git commit: Avoid warning message about invalid refuse_seconds value in Mesos >=0.21...

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master ae036d081 -> b063a61b9 Avoid warning message about invalid refuse_seconds value in Mesos >=0.21... Starting with version 0.21.0, Apache Mesos is very noisy if the filter parameter refuse_seconds is set to an invalid value like `-1`. I have

spark git commit: [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master b063a61b9 -> e72c16e30 [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races. This change adds some new utility code to handle shutdown hooks in Spark. The main goal is to take advantage of Hadoop 2.x's API for shutdown hooks,

spark git commit: [SPARK-6953] [PySpark] speed up python tests

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master e72c16e30 -> 3134c3fe4 [SPARK-6953] [PySpark] speed up python tests This PR try to speed up some python tests: ``` tests.py 144s -> 103s -41s mllib/classification.py 24s -> 17s-7s mllib/regression

spark git commit: Closes #5427

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3134c3fe4 -> 41ef78a94 Closes #5427 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41ef78a9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41ef78a9 Diff: http

spark git commit: [SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 41ef78a94 -> a0761ec70 [SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code. https://cwiki.apa

spark git commit: [SPARK-6490][Docs] Add docs for rpc configurations

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master a0761ec70 -> 3a3f7100f [SPARK-6490][Docs] Add docs for rpc configurations Added docs for rpc configurations and also fixed two places that should have been fixed in #5595. Author: zsxwing Closes #5607 from zsxwing/SPARK-6490-docs and sq

spark git commit: [MINOR] Comment improvements in ExternalSorter.

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 3a3f7100f -> 70f9f8ff3 [MINOR] Comment improvements in ExternalSorter. 1. Clearly specifies the contract/interactions for users of this class. 2. Minor fix in one doc to avoid ambiguity. Author: Patrick Wendell Closes #5620 from pwendell

spark git commit: [SPARK-6113] [ML] Small cleanups after original tree API PR

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 70f9f8ff3 -> 607eff0ed [SPARK-6113] [ML] Small cleanups after original tree API PR This does a few clean-ups. With this PR, all spark.ml tree components have ```private[ml]``` constructors. CC: mengxr Author: Joseph K. Bradley Closes

spark git commit: [SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution doc updates

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 607eff0ed -> bdc5c16e7 [SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution doc updates Part of the SPARK-6889 doc updates, to accompany wiki updates at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Sp