Horia, thanks for the detailed explanation. The concept of development workflow in spark is still a blurry subject to me.
My spark application is a scala class with a main function, nothing special here: I edit the code in my IDE, compile the code, run it, then go back to my IDE making more changes, compile the code, run it... . The UI on 4040 has no chance of running all the time. The spark-shell application you mentioned is a special case of a long running application. How do you develop while keeping the UI on 4040 up all the time? On Fri, Dec 27, 2013 at 7:50 PM, Horia <ho...@alum.berkeley.edu> wrote: > Hi Aureliano, > > The Spark Application is defined by all things executed within a given > Spark Context. This application's web server runs on port 4040 of the > machine where the driver of the application is being executed. An example > driver of a Spark Application is a single instance of the Spark Shell. This > web ui, on port 4040, displays statistics about the Application such as the > stages being executed, the number of tasks per stage and the progress of > the tasks within a stage. Other Application statistics include the caching > locations and percentages of RDDs being used within an Application (and > across the stages of that Application) and the garbage collection times of > the tasks that have been completed. > > The Spark Cluster is defined by all Applications executing on top of the > resources provisioned to your particular deployment of Spark. These > resources are managed by a Spark Master which contains the task scheduler > and the cluster manager (unless you're using YARN or Mesos in which case > they will provide the cluster manager). The UI on port 8080 is the UI of > the Spark Master, and it is accessible on whichever node is currently > executing the Spark Master. This UI displays cluster statistics such as the > number of available worker nodes, the number of JVM executor processes per > worker node, the number of running Applications utilizing this Cluster, et > cetera. > > In short, shutting down a Spark Application will kill the UI on port 4040 > because your application is terminated and therefore there are no running > statistics to collect about that application. However, the UI on port 8080 > continues to be up and report cluster-wide statistics until you kill the > cluster by killing Spark Master. > > Hope that long-winded explanation made sense! > Happy Holidays! > > > > On Fri, Dec 27, 2013 at 9:23 AM, Aureliano Buendia > <buendia...@gmail.com>wrote: > >> Hi, >> >> >> I'm a bit confused about web UI access of a stand alone spark app. >> >> - When running a spark app, a web server is launched at localhost:4040. >> When the standalone app execution is finished, the web server is shut down. >> What's the use of this web server? There is no way of reviewing the data >> when the standalone app exists. >> >> - Creating SparkContext at spark://localhost:7077 creates another web UI. >> Is this web UI supposed to be used with localhost:4040, or is it a >> replacement? >> >> - Creating a context with spark://localhost:7077, and after running >> ./bin/start-all.sh, I get this warning: >> >> WARN ClusterScheduler: Initial job has not accepted any resources; check >> your cluster UI to ensure that workers are registered and have sufficient >> memory >> > >