Cool. Using Ambari to monitor and scale up/down the cluster sounds
promising. Thanks for the pointer!
Mingyu
From: Deepak Sharma <deepakmc...@gmail.com>
Date: Monday, December 14, 2015 at 1:53 AM
To: cs user <acldstk...@gmail.com>
Cc: Mingyu Kim <m...@palantir.com>, &quo
review², and I didn¹t find much
else from my search.
This might be a general YARN question, but wanted to check if there¹s a
solution popular in the Spark community. Any sharing of experience around
autoscaling will be helpful!
Thanks,
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
/SPARK-3996. Would
this be reasonable?
Mingyu
On 10/7/15, 11:26 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>Seems like you might be running into
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_brows
Cool, we will start from there. Thanks Aaron and Josh!
Darin, it¹s likely because the DirectOutputCommitter is compiled with
Hadoop 1 classes and you¹re running it with Hadoop 2.
org.apache.hadoop.mapred.JobContext used to be a class in Hadoop 1, and it
became an interface in Hadoop 2.
Mingyu
I didn’t get any response. It’d be really appreciated if anyone using a special
OutputCommitter for S3 can comment on this!
Thanks,
Mingyu
From: Mingyu Kim m...@palantir.commailto:m...@palantir.com
Date: Monday, February 16, 2015 at 1:15 AM
To: user@spark.apache.orgmailto:user@spark.apache.org
with Spark.
Thanks,
Mingyu
I found a workaround.
I can make my auxiliary data a RDD. Partition it and cache it.
Later, I can cogroup it with other RDDs and Spark will try to keep the
cached RDD partitions where they are and not shuffle them.
--
View this message in context:
Also, Setting spark.locality.wait=100 did not work for me.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
partition specific auxiliary
data for processing the stream. I noticed that the partitions move among the
nodes. I cannot afford to move the large auxiliary data around.
Thanks,
Mingyu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition
That makes sense. Thanks everyone for the explanations!
Mingyu
From: Matei Zaharia matei.zaha...@gmail.com
Reply-To: user@spark.apache.org user@spark.apache.org
Date: Tuesday, July 15, 2014 at 3:00 PM
To: user@spark.apache.org user@spark.apache.org
Subject: Re: How does Spark speculation
actions are
not idempotent. For example, it may be counting a partition twice in case of
RDD.count or may be writing a partition to HDFS twice in case of
RDD.save*(). How does it prevent this kind of duplicated work?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:133
9)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:1
07)
Mingyu
smime.p7s
Description
Hi all,
Is there any plan for 1.0.1 release?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
union two RDDs, for example, rdd1 = [“a, b,
c”], rdd2 = [“1, 2, 3”, “4, 5, 6”], then
rdd1.union(rdd2).saveAsTextFile(…) should’ve resulted in a file with three
lines “a, b, c”, “1, 2, 3”, and “4, 5, 6” because the partitions from the
two reds are concatenated.
Mingyu
On 4/29/14, 10:55 PM
Okay, that makes sense. It’d be great if this can be better documented at
some point, because the only way to find out about the resulting RDD row
order is by looking at the code.
Thanks for the discussion!
Mingyu
On 4/29/14, 11:59 PM, Patrick Wendell pwend...@gmail.com wrote:
I don't think
.
(and, sort is really expensive.) On the other hand, if I can assume, say,
“filter” or “map” doesn’t shuffle the rows around, I can do the sort once
and assume that the order is retained throughout such operations saving a
lot of time from doing unnecessary sorts.
Mingyu
From: Mark Hamstra m
() because map
preserves the partition order. RDD order is also what allows me to get the
top k out of RDD by doing RDD.sort().take().
Am I misunderstanding it? Or, is it just when RDD is written to disk that
the order is not well preserved? Thanks in advance!
Mingyu
On 1/22/14, 4:46 PM, Patrick
17 matches
Mail list logo