Hi Cheng,
I think your scenario is acceptable for Spark's shuffle mechanism and will not
occur shuffle file name conflicts.
From my understanding I think the code snippet you mentioned is the same RDD
graph, just running twice, these two jobs will generate 3 stages, map stage and
collect stag
Cool, great job☺.
Thanks
Jerry
From: Ryan Williams [mailto:ryan.blake.willi...@gmail.com]
Sent: Thursday, February 26, 2015 6:11 PM
To: user; dev@spark.apache.org
Subject: Monitoring Spark with Graphite and Grafana
If anyone is curious to try exporting Spark metrics to Graphite, I just
publishe
Hi Mark,
For input streams like text input stream, only RDDs can be recovered from
checkpoint, no missed files, if file is missed, actually an exception will be
raised. If you use HDFS, HDFS will guarantee no data loss since it has 3
copies.Otherwise user logic has to guarantee no file deleted
ginal Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Monday, February 2, 2015 4:49 PM
To: Shao, Saisai
Cc: dev@spark.apache.org; u...@spark.apache.org
Subject: Re: Questions about Spark standalone resource scheduler
Hey Jerry,
I think standalone mode will still add more fea
Hi all,
I have some questions about the future development of Spark's standalone
resource scheduler. We've heard some users have the requirements to have
multi-tenant support in standalone mode, like multi-user management, resource
management and isolation, whitelist of users. Seems current Spa
to failure.
Thanks
Jerry
From: Cody Koeninger [mailto:c...@koeninger.org]
Sent: Tuesday, December 30, 2014 6:50 AM
To: Tathagata Das
Cc: Hari Shreedharan; Shao, Saisai; Sean McNamara; Patrick Wendell; Luis Ángel
Vicente Sánchez; Dibyendu Bhattacharya; dev@spark.apache.org; Koert Kuipers
Subject
Thanks Patrick for your detailed explanation.
BR
Jerry
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:43 PM
To: Cheng, Hao
Cc: Shao, Saisai; u...@spark.apache.org; dev@spark.apache.org
Subject: Re: Question on saveAsTextFile with
Hi,
We have such requirements to save RDD output to HDFS with saveAsTextFile like
API, but need to overwrite the data if existed. I'm not sure if current Spark
support such kind of operations, or I need to check this manually?
There's a thread in mailing list discussed about this
(http://apach
Hi all,
I agree with Hari that Strong exact-once semantics is very hard to guarantee,
especially in the failure situation. From my understanding even current
implementation of ReliableKafkaReceiver cannot fully guarantee the exact once
semantics once failed, first is the ordering of data replay
Hi,
Spark.local.dir is the one used to write map output data and persistent RDD
blocks, but the path of file has been hashed, so you cannot directly find the
persistent rdd block files, but definitely it will be in this folders on your
worker node.
Thanks
Jerry
From: Priya Ch [mailto:learnin
Hi folks,
I met several Spark SQL unit test failures when sort-based shuffle is enabled,
seems Spark SQL uses GenericMutableRow which will make ExternalSorter's
internal buffer all referred to the same object, I guess GenericMutableRow uses
only one mutable object to represent different rows, t
Hi,
I think this is an awesome feature for Spark Streaming Kafka interface to offer
user the controllability of partition offset, so user can have more
applications based on this.
What I concern is that if we want to do offset management, fault tolerant
related control and others, we have to t
12 matches
Mail list logo