Hi, You should check your firewalls because Spark executor try to connect Spark driver which runs in your client machine in yarn-client mode.
Regards JL On Tue, Aug 4, 2015 at 6:49 PM, manya cancerian <manyacancer...@gmail.com> wrote: > hi Guys, > > I am trying to run Zeppelin using Yarn as resource manager. I have made > following changes > 1- I have specified master as 'yarn-client' in the interpreter settings > using UI > 2. I have specified HADOOP_CONF_DIR as conf directory containing hadoop > configuration files > > In my scenario I have three machines. > a- Client Machine where zeppelin is installed > b- Machine where YARN cluster manager along with nodemanager, namenode, > datanode, secondary namenode are running > c- Machine where only nodemanager and datanode is running > > > When I submit job from my client machine , it gets submitted to yarn but > fails with following exception - > > > 5/08/04 15:08:05 ERROR yarn.ApplicationMaster: Uncaught exception: > org.apache.spark.SparkException: Failed to connect to driver! > at > org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:424) > at > org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:284) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:146) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:575) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:573) > at > org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596) > at > org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) > 15/08/04 15:08:05 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 10, (reason: Uncaught exception: Failed to connect to driver!) > > > > Any help is much appreciated! > > > > > Regards > > Monica > > > > > > On Tue, Aug 4, 2015 at 10:57 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> > wrote: > >> That worked. Why ? >> Can you share a comprehensive iist of examples. >> >> >> On Mon, Aug 3, 2015 at 4:59 PM, Alex <abezzu...@nflabs.com> wrote: >> >>> Hi, >>> >>> inside %spark you do not need to create SqlContext manually: >>> as with "sc" for SparkContext, Interpreter already have injected "sqlc" >>> val. >>> >>> Also AFAIK println statement should be in the separate paragraph. >>> >>> Can you try using that and see if it helps? >>> >>> -- >>> Kind regards, >>> Alexander >>> >>> On 04 Aug 2015, at 05:58, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: >>> >>> I am unable to see the visualization with Zeppelin from blog : >>> http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/ >>> >>> >>> Notebook >>> %spark >>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>> import sqlContext.implicits._ >>> import java.sql.Date >>> import org.apache.spark.sql.Row >>> >>> case class Log(level: String, date: Date, fileName: String) >>> >>> import java.text.SimpleDateFormat >>> >>> val df = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss,SSS") >>> >>> val ambari = ambariLogs.map { line => >>> val s = line.split(" ") >>> val logLevel = s(0) >>> val dateTime = df.parse(s(1) + " " + s(2)) >>> val fileName = s(3).split(":")(0) >>> Log(logLevel,new Date(dateTime.getTime()), fileName)}.toDF() >>> ambari.registerTempTable("ambari") >>> >>> >>> //ambari.groupBy("level").count() >>> sqlContext.sql("SELECT COUNT(*) from ambari") >>> >>> Output: >>> >>> sqlContext: org.apache.spark.sql.SQLContext = >>> org.apache.spark.sql.SQLContext@5ca68ee6 import sqlContext.implicits._ >>> import java.sql.Date import org.apache.spark.sql.Row defined class Log >>> import java.text.SimpleDateFormat df: java.text.SimpleDateFormat = >>> java.text.SimpleDateFormat@98f267e7 ambari: >>> org.apache.spark.sql.DataFrame = [level: string, date: date, fileName: >>> string] res74: org.apache.spark.sql.DataFrame = [c0: bigint] >>> >>> >>> Hence the table ambari is created successfully. >>> >>> In a new note, i wrote this >>> >>> %spark >>> import org.apache.spark.sql.Row >>> >>> val result = sqlContext.sql("SELECT level, COUNT(1) from ambari group >>> by level").map { >>> case Row(level: String, count: Long) => { >>> level + "\t" + count >>> } >>> }.collect() >>> >>> println("%table Log Level\tCount\n" + result.mkString("\n")) >>> >>> >>> Output: >>> import org.apache.spark.sql.Row result: Array[String] = Array(INFO 2444, >>> WARNING 3) %table Log Level Count INFO 2444 WARNING 3 >>> >>> I did not get graph rendering despite am outputing %table from println. >>> >>> Any suggestions ? >>> >>> >>> On Mon, Aug 3, 2015 at 1:47 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> Fixed it >>>> >>>> mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 >>>> -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests >>>> >>>> Earlier i had >>>> >>>> mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 >>>> -Phadoop-2.7 -Pyarn >>>> >>>> On Mon, Aug 3, 2015 at 1:31 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>> wrote: >>>> >>>>> I have hadoop cluster up using Ambari. It also allowed me to install >>>>> Spark 1.3.1 and i can run sample spark application & Yarn application. So >>>>> cluster is up and running fine. >>>>> >>>>> I got Zeppelin setup on a new box and was able to launch UI. >>>>> >>>>> I modified spark interpreter to set >>>>> >>>>> masteryarn-clientspark.app.nameZeppelinspark.cores.max >>>>> spark.driver.extraJavaOptions-Dhdp.version=2.3.1.0-2574 >>>>> spark.executor.memory512mspark.home/usr/hdp/2.3.1.0-2574/spark >>>>> spark.yarn.am.extraJavaOptions-Dhdp.version=2.3.1.0-2574spark.yarn.jar >>>>> /home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar >>>>> zeppelin.dep.localrepolocal-repo >>>>> >>>>> When i run a spark notebook >>>>> %spark >>>>> val ambariLogs = >>>>> sc.textFile("file:///var/log/ambari-agent/ambari-agent.log") >>>>> ambariLogs.take(10).mkString("\n") >>>>> >>>>> (The location exists) >>>>> >>>>> I see two exceptions in Zeppelin spark interpreter logs >>>>> >>>>> ERROR [2015-08-03 13:30:50,262] ({pool-1-thread-2} >>>>> ProcessFunction.java[process]:41) - Internal error processing getProgress >>>>> >>>>> java.lang.NoClassDefFoundError: Could not initialize class >>>>> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$ >>>>> >>>>> at >>>>> org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:38) >>>>> >>>>> at >>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55) >>>>> >>>>> at >>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>>>> >>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:381) >>>>> >>>>> at >>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301) >>>>> >>>>> at >>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) >>>>> >>>>> at >>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:298) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1068) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1053) >>>>> >>>>> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>>>> >>>>> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>>>> >>>>> at >>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> >>>>> AND >>>>> >>>>> >>>>> WARN [2015-08-03 13:30:50,085] ({pool-1-thread-2} >>>>> Logging.scala[logWarning]:71) - Service 'SparkUI' could not bind on port >>>>> 4041. Attempting port 4042. >>>>> >>>>> INFO [2015-08-03 13:30:50,112] ({pool-1-thread-2} >>>>> Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT >>>>> >>>>> WARN [2015-08-03 13:30:50,123] ({pool-1-thread-2} >>>>> AbstractLifeCycle.java[setFailed]:204) - FAILED >>>>> SelectChannelConnector@0.0.0.0:4042: java.net.BindException: Address >>>>> already in use >>>>> >>>>> java.net.BindException: Address already in use >>>>> >>>>> at sun.nio.ch.Net.bind0(Native Method) >>>>> >>>>> at sun.nio.ch.Net.bind(Net.java:444) >>>>> >>>>> at sun.nio.ch.Net.bind(Net.java:436) >>>>> >>>>> at >>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) >>>>> >>>>> >>>>> Any suggestions ? >>>>> >>>>> >>>>> On Mon, Aug 3, 2015 at 11:00 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks a lot for all these documents. Appreciate your effort & time. >>>>>> >>>>>> On Mon, Aug 3, 2015 at 10:15 AM, Christian Tzolov <ctzo...@pivotal.io >>>>>> > wrote: >>>>>> >>>>>>> ÐΞ€ρ@Ҝ (๏̯͡๏), >>>>>>> >>>>>>> I've successfully run Zeppelin with Spark on YARN. I'm using Ambari >>>>>>> and PivotalHD30. PHD30 is ODP compliant so you should be able to repeat >>>>>>> the >>>>>>> configuration for HDP (e.g. hortonworks). >>>>>>> >>>>>>> 1. Before you start with Zeppelin, make sure that your Spark/YARN >>>>>>> env. works from the command line (e.g run Pi test). If it doesn't work >>>>>>> make >>>>>>> sure that the hdp.version is set correctly or you can hardcode the >>>>>>> stack.name and stack.version properties as Ambari Custom yarn-site >>>>>>> properties (that is what i did). >>>>>>> >>>>>>> 2. Your Zeppelin should be build with proper Spark and Hadoop >>>>>>> versions and YARN support enabled. In my case used this build command: >>>>>>> >>>>>>> mvn clean package -Pspark-1.4 -Dspark.version=1.4.1 >>>>>>> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr >>>>>>> >>>>>>> 3. Open the Spark interpreter configuration and set 'master' >>>>>>> property to 'yarn-client' ( e.g. master=yarn-client). then press Save. >>>>>>> >>>>>>> 4. In (conf/zeppelin-env.sh) set HADOOP_CONF_DIR for PHD and HDP it >>>>>>> will look like this: >>>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>>>> >>>>>>> 5. (optional) i've restarted the zeppelin daemon but i don't think >>>>>>> this is required. >>>>>>> >>>>>>> 6. Make sure that HDFS has /user/<zeppelin user> folder exists and >>>>>>> has HDFS write permissions. Otherwise you can create it like this: >>>>>>> sudo -u hdfs hdfs dfs -mkdir /user/<zeppelin user> >>>>>>> sudo -u hdfs hdfs dfs -chown -R <zeppelin user>t:hdfs >>>>>>> /user/<zeppelin user> >>>>>>> >>>>>>> Good to go! >>>>>>> >>>>>>> Cheers, >>>>>>> Christian >>>>>>> >>>>>>> On 3 August 2015 at 17:50, Vadla, Karthik <karthik.va...@intel.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Deepak, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I have documented everything here. >>>>>>>> >>>>>>>> Please check published document. >>>>>>>> >>>>>>>> >>>>>>>> https://software.intel.com/sites/default/files/managed/bb/bf/Apache-Zeppelin.pdf >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Karthik Vadla >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >>>>>>>> *Sent:* Sunday, August 2, 2015 9:25 PM >>>>>>>> *To:* users@zeppelin.incubator.apache.org >>>>>>>> *Subject:* Yarn + Spark + Zepplin ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I would like to try out Zepplin and hence i got a 7 node Hadoop >>>>>>>> cluster with spark history server setup. I am able to run sample spark >>>>>>>> applications on my YARN cluster. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I have no clue how to get zepplin to connect to this YARN cluster. >>>>>>>> Under >>>>>>>> https://zeppelin.incubator.apache.org/docs/install/install.html i >>>>>>>> see MASTER to point to spark master. I do not have a spark master >>>>>>>> running. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> How do i get Zepplin to be able to read data from YARN cluster ? >>>>>>>> Please share documentation. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Deepak >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution >>>>>>> Architect, EMEA Practice Team | Pivotal <http://pivotal.io/> >>>>>>> ctzo...@pivotal.io|+31610285517 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Deepak >>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >> >> >> -- >> Deepak >> >> > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net