Hi Patrick: Do you know what the status of this issue is? Is there a JIRA
that is tracking this issue?
Thanks.
Asim
Patrick Wendell writes: Within a partition things will spill - so the
current documentation is correct. This spilling can only occur *across keys*
at the moment. Spilling cannot
Hi Patrick,
For the spilling within on key work you mention might land in Spark 1.2, is
that being tracked in https://issues.apache.org/jira/browse/SPARK-1823 or
is there another ticket I should be following?
Thanks!
Andrew
On Tue, Aug 5, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com
Hey Andrew,
We might create a new JIRA for it, but it doesn't exist yet. We'll create
JIRA's for the major 1.2 issues at the beginning of September.
- Patrick
On Mon, Aug 25, 2014 at 8:53 AM, Andrew Ash and...@andrewash.com wrote:
Hi Patrick,
For the spilling within on key work you mention
I'm doing a simple groupBy on a fairly small dataset (80 files in HDFS, few
gigs in total, line based, 500-2000 chars per line). I'm running Spark on 8
low-memory machines in a yarn cluster, i.e. something along the lines of:
spark-submit ... --master yarn-client --num-executors 8
Patrick Wendell wrote
In the latest version of Spark we've added documentation to make this
distinction more clear to users:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L390
That is a very good addition to the documentation. Nice
Hi Jens,
Within a partition things will spill - so the current documentation is
correct. This spilling can only occur *across keys* at the moment. Spilling
cannot occur within a key at present.
This is discussed in the video here: