JL, I tried after disabling the firewalls as well , but no luck :(
Regards Manya On Wed, Aug 5, 2015 at 12:07 PM, Jongyoul Lee <jongy...@gmail.com> wrote: > Hi, > > You should check your firewalls because Spark executor try to connect > Spark driver which runs in your client machine in yarn-client mode. > > Regards > JL > > On Tue, Aug 4, 2015 at 6:49 PM, manya cancerian <manyacancer...@gmail.com> > wrote: > >> hi Guys, >> >> I am trying to run Zeppelin using Yarn as resource manager. I have made >> following changes >> 1- I have specified master as 'yarn-client' in the interpreter settings >> using UI >> 2. I have specified HADOOP_CONF_DIR as conf directory containing hadoop >> configuration files >> >> In my scenario I have three machines. >> a- Client Machine where zeppelin is installed >> b- Machine where YARN cluster manager along with nodemanager, namenode, >> datanode, secondary namenode are running >> c- Machine where only nodemanager and datanode is running >> >> >> When I submit job from my client machine , it gets submitted to yarn but >> fails with following exception - >> >> >> 5/08/04 15:08:05 ERROR yarn.ApplicationMaster: Uncaught exception: >> org.apache.spark.SparkException: Failed to connect to driver! >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:424) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:284) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:146) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:575) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >> at >> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:573) >> at >> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596) >> at >> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) >> 15/08/04 15:08:05 INFO yarn.ApplicationMaster: Final app status: FAILED, >> exitCode: 10, (reason: Uncaught exception: Failed to connect to driver!) >> >> >> >> Any help is much appreciated! >> >> >> >> >> Regards >> >> Monica >> >> >> >> >> >> On Tue, Aug 4, 2015 at 10:57 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >> wrote: >> >>> That worked. Why ? >>> Can you share a comprehensive iist of examples. >>> >>> >>> On Mon, Aug 3, 2015 at 4:59 PM, Alex <abezzu...@nflabs.com> wrote: >>> >>>> Hi, >>>> >>>> inside %spark you do not need to create SqlContext manually: >>>> as with "sc" for SparkContext, Interpreter already have injected "sqlc" >>>> val. >>>> >>>> Also AFAIK println statement should be in the separate paragraph. >>>> >>>> Can you try using that and see if it helps? >>>> >>>> -- >>>> Kind regards, >>>> Alexander >>>> >>>> On 04 Aug 2015, at 05:58, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: >>>> >>>> I am unable to see the visualization with Zeppelin from blog : >>>> http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/ >>>> >>>> >>>> Notebook >>>> %spark >>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>> import sqlContext.implicits._ >>>> import java.sql.Date >>>> import org.apache.spark.sql.Row >>>> >>>> case class Log(level: String, date: Date, fileName: String) >>>> >>>> import java.text.SimpleDateFormat >>>> >>>> val df = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss,SSS") >>>> >>>> val ambari = ambariLogs.map { line => >>>> val s = line.split(" ") >>>> val logLevel = s(0) >>>> val dateTime = df.parse(s(1) + " " + s(2)) >>>> val fileName = s(3).split(":")(0) >>>> Log(logLevel,new Date(dateTime.getTime()), fileName)}.toDF() >>>> ambari.registerTempTable("ambari") >>>> >>>> >>>> //ambari.groupBy("level").count() >>>> sqlContext.sql("SELECT COUNT(*) from ambari") >>>> >>>> Output: >>>> >>>> sqlContext: org.apache.spark.sql.SQLContext = >>>> org.apache.spark.sql.SQLContext@5ca68ee6 import sqlContext.implicits._ >>>> import java.sql.Date import org.apache.spark.sql.Row defined class Log >>>> import java.text.SimpleDateFormat df: java.text.SimpleDateFormat = >>>> java.text.SimpleDateFormat@98f267e7 ambari: >>>> org.apache.spark.sql.DataFrame = [level: string, date: date, fileName: >>>> string] res74: org.apache.spark.sql.DataFrame = [c0: bigint] >>>> >>>> >>>> Hence the table ambari is created successfully. >>>> >>>> In a new note, i wrote this >>>> >>>> %spark >>>> import org.apache.spark.sql.Row >>>> >>>> val result = sqlContext.sql("SELECT level, COUNT(1) from ambari group >>>> by level").map { >>>> case Row(level: String, count: Long) => { >>>> level + "\t" + count >>>> } >>>> }.collect() >>>> >>>> println("%table Log Level\tCount\n" + result.mkString("\n")) >>>> >>>> >>>> Output: >>>> import org.apache.spark.sql.Row result: Array[String] = Array(INFO >>>> 2444, WARNING 3) %table Log Level Count INFO 2444 WARNING 3 >>>> >>>> I did not get graph rendering despite am outputing %table from println. >>>> >>>> Any suggestions ? >>>> >>>> >>>> On Mon, Aug 3, 2015 at 1:47 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>> wrote: >>>> >>>>> Fixed it >>>>> >>>>> mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 >>>>> -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests >>>>> >>>>> Earlier i had >>>>> >>>>> mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 >>>>> -Phadoop-2.7 -Pyarn >>>>> >>>>> On Mon, Aug 3, 2015 at 1:31 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> I have hadoop cluster up using Ambari. It also allowed me to install >>>>>> Spark 1.3.1 and i can run sample spark application & Yarn application. So >>>>>> cluster is up and running fine. >>>>>> >>>>>> I got Zeppelin setup on a new box and was able to launch UI. >>>>>> >>>>>> I modified spark interpreter to set >>>>>> >>>>>> masteryarn-clientspark.app.nameZeppelinspark.cores.max >>>>>> spark.driver.extraJavaOptions-Dhdp.version=2.3.1.0-2574 >>>>>> spark.executor.memory512mspark.home/usr/hdp/2.3.1.0-2574/spark >>>>>> spark.yarn.am.extraJavaOptions-Dhdp.version=2.3.1.0-2574 >>>>>> spark.yarn.jar >>>>>> /home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar >>>>>> zeppelin.dep.localrepolocal-repo >>>>>> >>>>>> When i run a spark notebook >>>>>> %spark >>>>>> val ambariLogs = >>>>>> sc.textFile("file:///var/log/ambari-agent/ambari-agent.log") >>>>>> ambariLogs.take(10).mkString("\n") >>>>>> >>>>>> (The location exists) >>>>>> >>>>>> I see two exceptions in Zeppelin spark interpreter logs >>>>>> >>>>>> ERROR [2015-08-03 13:30:50,262] ({pool-1-thread-2} >>>>>> ProcessFunction.java[process]:41) - Internal error processing getProgress >>>>>> >>>>>> java.lang.NoClassDefFoundError: Could not initialize class >>>>>> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$ >>>>>> >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:38) >>>>>> >>>>>> at >>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55) >>>>>> >>>>>> at >>>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>>>>> >>>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:381) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:298) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1068) >>>>>> >>>>>> at >>>>>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1053) >>>>>> >>>>>> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>>>>> >>>>>> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>>>>> >>>>>> at >>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) >>>>>> >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> >>>>>> AND >>>>>> >>>>>> >>>>>> WARN [2015-08-03 13:30:50,085] ({pool-1-thread-2} >>>>>> Logging.scala[logWarning]:71) - Service 'SparkUI' could not bind on port >>>>>> 4041. Attempting port 4042. >>>>>> >>>>>> INFO [2015-08-03 13:30:50,112] ({pool-1-thread-2} >>>>>> Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT >>>>>> >>>>>> WARN [2015-08-03 13:30:50,123] ({pool-1-thread-2} >>>>>> AbstractLifeCycle.java[setFailed]:204) - FAILED >>>>>> SelectChannelConnector@0.0.0.0:4042: java.net.BindException: Address >>>>>> already in use >>>>>> >>>>>> java.net.BindException: Address already in use >>>>>> >>>>>> at sun.nio.ch.Net.bind0(Native Method) >>>>>> >>>>>> at sun.nio.ch.Net.bind(Net.java:444) >>>>>> >>>>>> at sun.nio.ch.Net.bind(Net.java:436) >>>>>> >>>>>> at >>>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) >>>>>> >>>>>> >>>>>> Any suggestions ? >>>>>> >>>>>> >>>>>> On Mon, Aug 3, 2015 at 11:00 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks a lot for all these documents. Appreciate your effort & time. >>>>>>> >>>>>>> On Mon, Aug 3, 2015 at 10:15 AM, Christian Tzolov < >>>>>>> ctzo...@pivotal.io> wrote: >>>>>>> >>>>>>>> ÐΞ€ρ@Ҝ (๏̯͡๏), >>>>>>>> >>>>>>>> I've successfully run Zeppelin with Spark on YARN. I'm using Ambari >>>>>>>> and PivotalHD30. PHD30 is ODP compliant so you should be able to >>>>>>>> repeat the >>>>>>>> configuration for HDP (e.g. hortonworks). >>>>>>>> >>>>>>>> 1. Before you start with Zeppelin, make sure that your Spark/YARN >>>>>>>> env. works from the command line (e.g run Pi test). If it doesn't work >>>>>>>> make >>>>>>>> sure that the hdp.version is set correctly or you can hardcode the >>>>>>>> stack.name and stack.version properties as Ambari Custom yarn-site >>>>>>>> properties (that is what i did). >>>>>>>> >>>>>>>> 2. Your Zeppelin should be build with proper Spark and Hadoop >>>>>>>> versions and YARN support enabled. In my case used this build command: >>>>>>>> >>>>>>>> mvn clean package -Pspark-1.4 -Dspark.version=1.4.1 >>>>>>>> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr >>>>>>>> >>>>>>>> 3. Open the Spark interpreter configuration and set 'master' >>>>>>>> property to 'yarn-client' ( e.g. master=yarn-client). then press Save. >>>>>>>> >>>>>>>> 4. In (conf/zeppelin-env.sh) set HADOOP_CONF_DIR for PHD and HDP it >>>>>>>> will look like this: >>>>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>>>>> >>>>>>>> 5. (optional) i've restarted the zeppelin daemon but i don't think >>>>>>>> this is required. >>>>>>>> >>>>>>>> 6. Make sure that HDFS has /user/<zeppelin user> folder exists and >>>>>>>> has HDFS write permissions. Otherwise you can create it like this: >>>>>>>> sudo -u hdfs hdfs dfs -mkdir /user/<zeppelin user> >>>>>>>> sudo -u hdfs hdfs dfs -chown -R <zeppelin user>t:hdfs >>>>>>>> /user/<zeppelin user> >>>>>>>> >>>>>>>> Good to go! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Christian >>>>>>>> >>>>>>>> On 3 August 2015 at 17:50, Vadla, Karthik <karthik.va...@intel.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Deepak, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I have documented everything here. >>>>>>>>> >>>>>>>>> Please check published document. >>>>>>>>> >>>>>>>>> >>>>>>>>> https://software.intel.com/sites/default/files/managed/bb/bf/Apache-Zeppelin.pdf >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Karthik Vadla >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >>>>>>>>> *Sent:* Sunday, August 2, 2015 9:25 PM >>>>>>>>> *To:* users@zeppelin.incubator.apache.org >>>>>>>>> *Subject:* Yarn + Spark + Zepplin ? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I would like to try out Zepplin and hence i got a 7 node Hadoop >>>>>>>>> cluster with spark history server setup. I am able to run sample spark >>>>>>>>> applications on my YARN cluster. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I have no clue how to get zepplin to connect to this YARN cluster. >>>>>>>>> Under >>>>>>>>> https://zeppelin.incubator.apache.org/docs/install/install.html i >>>>>>>>> see MASTER to point to spark master. I do not have a spark master >>>>>>>>> running. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> How do i get Zepplin to be able to read data from YARN cluster ? >>>>>>>>> Please share documentation. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Deepak >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution >>>>>>>> Architect, EMEA Practice Team | Pivotal <http://pivotal.io/> >>>>>>>> ctzo...@pivotal.io|+31610285517 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Deepak >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Deepak >>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >> > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >