Repository: spark
Updated Branches:
refs/heads/branch-1.0 1a0a2f81a - 2693035ba
[SPARK-2580] [PySpark] keep silent in worker if JVM close the socket
During rdd.take(n), JVM will close the socket if it had got enough data, the
Python worker should keep silent in this case.
In the same time,
Repository: spark
Updated Branches:
refs/heads/branch-1.0 2693035ba - e0bc72eb7
[SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle
fix the problem with pickle operator.itemgetter with multiple index.
Author: Davies Liu davies@gmail.com
Closes #1627 from davies/itemgetter and
Repository: spark
Updated Branches:
refs/heads/master 96ba04bbf - 20424dad3
[SPARK-2174][MLLIB] treeReduce and treeAggregate
In `reduce` and `aggregate`, the driver node spends linear time on the number
of partitions. It becomes a bottleneck when there are many partitions and the
data from
Repository: spark
Updated Branches:
refs/heads/branch-1.0 e0bc72eb7 - 3143e51d7
[SPARK-2730][SQL] When retrieving a value from a Map, GetItem evaluates key
twice
JIRA: https://issues.apache.org/jira/browse/SPARK-2730
Author: Yin Huai h...@cse.ohio-state.edu
Closes #1637 from
Repository: spark
Updated Branches:
refs/heads/master e3643485d - f0d880e28
[SPARK-2674] [SQL] [PySpark] support datetime type for SchemaRDD
Datetime and time in Python will be converted into java.util.Calendar after
serialization, it will be converted into java.sql.Timestamp during
Repository: spark
Updated Branches:
refs/heads/master 2c356665c - 39b819310
[SPARK-2716][SQL] Don't check resolved for having filters.
For queries like `... HAVING COUNT(*) 9` the expression is always resolved
since it contains no attributes. This was causing us to avoid doing the Having
Repository: spark
Updated Branches:
refs/heads/master 84467468d - 2e6efcace
[SPARK-2568] RangePartitioner should run only one job if data is balanced
As of Spark 1.0, RangePartitioner goes through data twice: once to compute the
count and once to do sampling. As a result, to do sortByKey,