git commit: [SPARK-2580] [PySpark] keep silent in worker if JVM close the socket

2014-07-29 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.0 1a0a2f81a - 2693035ba [SPARK-2580] [PySpark] keep silent in worker if JVM close the socket During rdd.take(n), JVM will close the socket if it had got enough data, the Python worker should keep silent in this case. In the same time,

git commit: [SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle

2014-07-29 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.0 2693035ba - e0bc72eb7 [SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle fix the problem with pickle operator.itemgetter with multiple index. Author: Davies Liu davies@gmail.com Closes #1627 from davies/itemgetter and

git commit: [SPARK-2174][MLLIB] treeReduce and treeAggregate

2014-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 96ba04bbf - 20424dad3 [SPARK-2174][MLLIB] treeReduce and treeAggregate In `reduce` and `aggregate`, the driver node spends linear time on the number of partitions. It becomes a bottleneck when there are many partitions and the data from

git commit: [SPARK-2730][SQL] When retrieving a value from a Map, GetItem evaluates key twice

2014-07-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.0 e0bc72eb7 - 3143e51d7 [SPARK-2730][SQL] When retrieving a value from a Map, GetItem evaluates key twice JIRA: https://issues.apache.org/jira/browse/SPARK-2730 Author: Yin Huai h...@cse.ohio-state.edu Closes #1637 from

git commit: [SPARK-2674] [SQL] [PySpark] support datetime type for SchemaRDD

2014-07-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e3643485d - f0d880e28 [SPARK-2674] [SQL] [PySpark] support datetime type for SchemaRDD Datetime and time in Python will be converted into java.util.Calendar after serialization, it will be converted into java.sql.Timestamp during

git commit: [SPARK-2716][SQL] Don't check resolved for having filters.

2014-07-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2c356665c - 39b819310 [SPARK-2716][SQL] Don't check resolved for having filters. For queries like `... HAVING COUNT(*) 9` the expression is always resolved since it contains no attributes. This was causing us to avoid doing the Having

git commit: [SPARK-2568] RangePartitioner should run only one job if data is balanced

2014-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 84467468d - 2e6efcace [SPARK-2568] RangePartitioner should run only one job if data is balanced As of Spark 1.0, RangePartitioner goes through data twice: once to compute the count and once to do sampling. As a result, to do sortByKey,