and otherData use the
same partitioner, spark knows it doesn't need to resort d3 each time. It
can use the existing shuffle output it already has sitting on disk. So
you'll see the stage is skipped in the UI (except for the first job):
On Fri, May 22, 2015 at 11:59 AM, Shay Seng s
Hi.
I have a job that takes
~50min with Spark 0.9.3 and
~1.8hrs on Spark 1.3.1 on the same cluster.
The only code difference between the two code bases is to fix the Seq -
Iter changes that happened in the Spark 1.x series.
Are there any other changes in the defaults from spark 0.9.3 - 1.3.1
Hi.
I have an RDD that I use repeatedly through many iterations of an
algorithm. To prevent recomputation, I persist the RDD (and incidentally I
also persist and checkpoint it's parents)
val consCostConstraintMap = consCost.join(constraintMap).map {
case (cid, (costs,(mid1,_,mid2,_,_))) = {
Hi,
I've been trying to move up from spark 0.9.2 to 1.1.0.
I'm getting a little confused with the setup for a few different use cases,
grateful for any pointers...
(1) spark-shell + with jars that are only required by the driver
(1a)
I added spark.driver.extraClassPath /mypath/to.jar to my
.
Unfortunately, you do have to specify each JAR separately; you can maybe
use a shell script to list a directory and get a big list, or set up a
project that builds all of the dependencies into one assembly JAR.
Matei
On Oct 30, 2014, at 5:24 PM, Shay Seng s...@urbanengines.com wrote:
Hi
Hi,
Why does RDD.saveAsObjectFile() to S3 leave a bunch of *_$folder$ empty
files around? Is it possible for the saveas to clean up?
tks
Hey,
I actually have 2 question
(1) I want to generate unique IDs for each RDD element and I want to
assign them in parallel so I do
rdd.mapPartitionsWithIndex((index, s) = {
var count = 0L
s.zipWithIndex.map {
case (t, i) = {
count += 1
(index *
Hey Sparkies...
I have an odd bug.
I am running Spark 0.9.2 on Amazon EC2 machines as a job (i.e. not in REPL)
After a bunch of processing, I tell spark to save my rdd to S3 using:
rdd.saveAsSequenceFile(uri,codec)
That line of code hangs. By hang I mean
(a) Spark stages UI shows no update on
Hi,
I am running Spark 0.9.2 on an EC2 cluster with about 16 r3.4xlarge machines
The cluster is running Spark standalone and is launched with the ec2
scripts.
In my Spark job, I am using ephemeral HDFS to checkpoint some of my RDDs.
I'm also reading and writing to S3. My jobs also involve a large
it should be a pretty similar setup.
On Thu, Aug 21, 2014 at 1:23 PM, Shay Seng s...@urbanengines.com wrote:
Hi,
I am running Spark 0.9.2 on an EC2 cluster with about 16 r3.4xlarge
machines
The cluster is running Spark standalone and is launched with the ec2
scripts.
In my Spark job, I am
10 matches
Mail list logo