I ran into this next issue now. I ran a very simple Python command - print date time, and I got the following error "org.apache.spark.SparkException: Yarn application has already ended!"
Has anyone seen this error before? I have not done any additional configuration Zeppelin, am I missing something in the configs? Francis *Command* %pyspark import datetime print "Start Time: " + str(datetime.datetime.now()) *Error* org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.<init>(SparkContext.scala:381) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:353) at org.apache.zeppelin.spark.PySparkInterpreter.getJavaSparkContext(PySparkInterpreter.java:374) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:140) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wed, Jul 29, 2015 at 11:54 AM, Francis Lau <francis....@smartsheet.com> wrote: > Thanks Ranjit and Alexander, > > I added 8081 to my tunnel script and now it is connected. I will try to > execute pyspark commands next. > > Just to offer a little value back to the future newbies like me, here is > my bash script that connects all the UI ports for EMR, Spark, iPython > Notebook and Zeppelin. I assume that this email thread will get archived > in a Google searchable location. These ports works for EMR release 4.0 with > Spark and others installed. Zeppelin and iPython Notebook were installed > via custom bootstrap scripts. > > Francis > > # ------------------------------- > # TunnelSpark.sh > # ------------------------------- > > # This script is called with a single argument for IP address of the > # EMR master node (which has Spark driver, Zeppelin, iPython Notebook, Hue > on it as well) > > # For list of AWS EMR ports, see: > # > https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html#d0e708 > > echo 'Tunneling to iPython Notebook (port 8192)...' > echo > echo 'Tunneling to Spark UI (port 18080)...' > echo 'Tunneling to Spark UI (port 4040,4041,4042)...' > echo > echo 'Tunneling to Hadoop Resource Manager (port 8088)...' > echo 'Tunneling to Hadoop Node Manager (port 8042)...' > echo > echo 'Tunneling to Hue (port 8888)...' > echo > echo 'Tunneling to Zeppelin (port 8080,8081)...' > > ssh -o ServerAliveInterval=10 -i ~/.ssh/POCMasterKey.pem -N \ > -L 8192:ec2-$1.compute-1.amazonaws.com:8192 \ > -L 18080:ec2-$1.compute-1.amazonaws.com:18080 \ > -L 4040:ec2-$1.compute-1.amazonaws.com:4040 \ > -L 4041:ec2-$1.compute-1.amazonaws.com:4041 \ > -L 4042:ec2-$1.compute-1.amazonaws.com:4042 \ > -L 8088:ec2-$1.compute-1.amazonaws.com:8088 \ > -L 8042:ec2-$1.compute-1.amazonaws.com:8042 \ > -L 8888:ec2-$1.compute-1.amazonaws.com:8888 \ > -L 8080:ec2-$1.compute-1.amazonaws.com:8080 \ > -L 8081:ec2-$1.compute-1.amazonaws.com:8081 \ > hadoop@ec2-$1.compute-1.amazonaws.com > > > > On Tue, Jul 28, 2015 at 9:34 PM, Ranjit Manuel <ranjit.f.man...@gmail.com> > wrote: > >> Couple of things to check >> >> 1. Websocket port is available >> 2. Check logs for any errors >> 3. Web browser you are using..this happened with me and found that it >> works only with Mozilla Firefox >> On Jul 29, 2015 4:31 AM, "Francis Lau" <francis....@smartsheet.com> >> wrote: >> >>> Anyone has Zeppelin working against AWS EMR 4.0 with Spark? >>> >>> The 4.0 version of EMR was just released last week: >>> http://aws.amazon.com/about-aws/whats-new/2015/07/amazon-emr-release-4-0-0-with-new-versions-of-apache-hadoop-hive-and-spark-now-available/ >>> >>> I found this bootstrap and I got a new cluster up and running without >>> errors: >>> https://gist.github.com/andershammar/224e1077021d0ea376dd#comments >>> >>> But the Zepp UI shows the "disconnected" red label and I also cannot >>> create a new notebook. >>> >>> I am very new to Zeppelin so it may be a rookie issue :) i.e. configs or >>> connections. >>> >>> Help? >>> >>> -- >>> *Francis * >>> >> > > > -- > *Francis Lau* | *Smartsheet* > Senior Director of Product Intelligence > *c* 425-830-3889 (call/text) > francis....@smartsheet.com <jason.terav...@smartsheet.com> > -- *Francis Lau* | *Smartsheet* Senior Director of Product Intelligence *c* 425-830-3889 (call/text) francis....@smartsheet.com <jason.terav...@smartsheet.com>