I am building an analytics app with Spark. I plan to use long-lived
SparkContexts to minimize the overhead for creating Spark contexts, which in
turn reduces the analytics query response time.
The number of queries that are run in the system is relatively small each
day. Would long lived contexts
I am running Spark jobs on YARN cluster. It took ~30 seconds to create a
spark context, while it takes only 1-2 seconds running Spark in local mode.
The master is set as yarn-client, and both the machine that submits the
Spark job and the YARN cluster are in the same domain.
Originally I suspecte
created the
Spark StreamingContext, and responds to shutdown requests
Does Spark Streaming already provide similar capabilities?
Stanley
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initiate-a-shutdown-of-Spark-Streaming-context-tp14092p14252.html
o that the call StreamingContext.stop(...)
is made?
Thanks,
Stanley
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initiate-a-shutdown-of-Spark-Streaming-context-tp14092.html
Sent from the Apache Spark User List mailing list archive at Nabb
This seems a bug, right? It's not the user's responsibility to manage the
workers.
On Wed, Aug 13, 2014 at 11:28 AM, S. Zhou wrote:
> Sometimes workers are dead but spark context does not know it and still
> send jobs.
>
>
> On Tuesday, August 12, 2014 7:14
RDD don't *need* replication; but it doesn't harm if the underlying things
has replication.
On Mon, Aug 4, 2014 at 5:51 PM, Deep Pradhan
wrote:
> Hi,
> Spark can run on top of HDFS.
> While Spark talks about the RDDs which do not need replication because the
> partitions can be built with the h