Horia, thanks for the detailed explanation. The concept of development
workflow in spark is still a blurry subject to me.

My spark application is a scala class with a main function, nothing special
here: I edit the code in my IDE, compile the code, run it, then go back to
my IDE making more changes, compile the code, run it... . The UI on 4040
has no chance of running all the time. The spark-shell application you
mentioned is a special case of a long running application. How do you
develop while keeping the UI on 4040 up all the time?


On Fri, Dec 27, 2013 at 7:50 PM, Horia <ho...@alum.berkeley.edu> wrote:

> Hi Aureliano,
>
> The Spark Application is defined by all things executed within a given
> Spark Context. This application's web server runs on port 4040 of the
> machine where the driver of the application is being executed. An example
> driver of a Spark Application is a single instance of the Spark Shell. This
> web ui, on port 4040, displays statistics about the Application such as the
> stages being executed, the number of tasks per stage and the progress of
> the tasks within a stage. Other Application statistics include the caching
> locations and percentages of RDDs being used within an Application (and
> across the stages of that Application) and the garbage collection times of
> the tasks that have been completed.
>
> The Spark Cluster is defined by all Applications executing on top of the
> resources provisioned to your particular deployment of Spark. These
> resources are managed by a Spark Master which contains the task scheduler
> and the cluster manager (unless you're using YARN or Mesos in which case
> they will provide the cluster manager). The UI on port 8080 is the UI of
> the Spark Master, and it is accessible on whichever node is currently
> executing the Spark Master. This UI displays cluster statistics such as the
> number of available worker nodes, the number of JVM executor processes per
> worker node, the number of running Applications utilizing this Cluster, et
> cetera.
>
> In short, shutting down a Spark Application will kill the UI on port 4040
> because your application is terminated and therefore there are no running
> statistics to collect about that application. However, the UI on port 8080
> continues to be up and report cluster-wide statistics until you kill the
> cluster by killing Spark Master.
>
> Hope that long-winded explanation made sense!
> Happy Holidays!
>
>
>
> On Fri, Dec 27, 2013 at 9:23 AM, Aureliano Buendia 
> <buendia...@gmail.com>wrote:
>
>> Hi,
>>
>>
>> I'm a bit confused about web UI access of a stand alone spark app.
>>
>> - When running a spark app, a web server is launched at localhost:4040.
>> When the standalone app execution is finished, the web server is shut down.
>> What's the use of this web server? There is no way of reviewing the data
>> when the standalone app exists.
>>
>> - Creating SparkContext at spark://localhost:7077 creates another web UI.
>> Is this web UI supposed to be used with localhost:4040, or is it a
>> replacement?
>>
>> - Creating a context with spark://localhost:7077, and after running
>> ./bin/start-all.sh, I get this warning:
>>
>> WARN ClusterScheduler: Initial job has not accepted any resources; check
>> your cluster UI to ensure that workers are registered and have sufficient
>> memory
>>
>
>

Reply via email to