Does long-lived SparkContext hold on to executor resources?

2015-05-11 Thread stanley
I am building an analytics app with Spark. I plan to use long-lived SparkContexts to minimize the overhead for creating Spark contexts, which in turn reduces the analytics query response time. The number of queries that are run in the system is relatively small each day. Would long lived contexts

It takes too long (30 seconds) to create Spark Context with SPARK/YARN

2015-05-11 Thread stanley
I am running Spark jobs on YARN cluster. It took ~30 seconds to create a spark context, while it takes only 1-2 seconds running Spark in local mode. The master is set as yarn-client, and both the machine that submits the Spark job and the YARN cluster are in the same domain. Originally I suspecte

Re: How to initiate a shutdown of Spark Streaming context?

2014-09-15 Thread stanley
created the Spark StreamingContext, and responds to shutdown requests Does Spark Streaming already provide similar capabilities? Stanley -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initiate-a-shutdown-of-Spark-Streaming-context-tp14092p14252.html

How to initiate a shutdown of Spark Streaming context?

2014-09-12 Thread stanley
o that the call StreamingContext.stop(...) is made? Thanks, Stanley -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initiate-a-shutdown-of-Spark-Streaming-context-tp14092.html Sent from the Apache Spark User List mailing list archive at Nabb

Re: how to access workers from spark context

2014-08-12 Thread Stanley Shi
This seems a bug, right? It's not the user's responsibility to manage the workers. On Wed, Aug 13, 2014 at 11:28 AM, S. Zhou wrote: > Sometimes workers are dead but spark context does not know it and still > send jobs. > > > On Tuesday, August 12, 2014 7:14

Re: Spark on HDFS with replication

2014-08-04 Thread Stanley Shi
RDD don't *need* replication; but it doesn't harm if the underlying things has replication. On Mon, Aug 4, 2014 at 5:51 PM, Deep Pradhan wrote: > Hi, > Spark can run on top of HDFS. > While Spark talks about the RDDs which do not need replication because the > partitions can be built with the h