Hi Moon, Sorry for late reply. Missed out this one.
If you are running in yarn-cluster mode ZeppelinServer does not need to access nodes in yarn cluster. That is the whole purpose of yarn-cluster mode option of Spark in my understanding. Right now you can achieve the same thing even when using SparkStandalone or SparkOnMesos. Regards, Sourav On Tue, Oct 13, 2015 at 1:47 AM, moon soo Lee <m...@apache.org> wrote: > Thanks for sharing your use case. > > Then, let's say Zeppelin runs SparkInterpreter process using spark-submit > with yarn-cluster mode without error. SparkInterpreter is then runs inside > an application master process which is managed by YARN on the cluster. and > ZeppelinServer can get host and port somehow and connect to the > SparkInterpreter process using thrift protocol. > > But, that means ZeppelinServer still need to access node in yarn cluster > to connect to SparkInterpreter process that runs in application master. > > Would this okay for your case? > > And i'm also curious how other people handle the situation, ie. case the > spark drivers need to have access to all data ndoes/slaves nodes. > > Thanks, > moon > > > On Mon, Oct 12, 2015 at 12:31 AM Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Moon, >> >> It is to support an architecture where Zeppeline does not need to run in >> the same machine/cluster where spark/hadoop is running. >> >> Right now it us not possible to achieve the same as in yarn-client mode >> as in that case the spark drivers needs to have access to all data >> nodes/slave nodes. >> >> One can achieve the same having a remote spark stand alone cluster. But >> in that case I cannot use yarn to address the workload management. >> >> Regards, >> Souravu >> >> On Oct 11, 2015, at 12:25 PM, moon soo Lee <m...@apache.org> wrote: >> >> My apologies, i missed the most important part of the question. >> Yarn-cluster mode. Zeppelin is not expected to work with yarn-cluster mode >> at the moment. >> >> Is there any special reason you need to use yarn-cluster mode instead of >> yarn-client mode? >> >> Thanks, >> moon >> >> On 2015년 10월 11일 (일) at 오후 8:41 Sourav Mazumder < >> sourav.mazumde...@gmail.com> wrote: >> >>> Hi Moon, >>> >>> Yes I have checked the same. >>> >>> I have put some debug statement in the interpreter.sh to see what >>> exactly is getting passed when I set the SPARK_HOME in zeppelin-env.sh. >>> >>> The debug statement does show that it is using the spark-submit utility >>> from the bin folder of the SPARK_HOME which I have set in zeppelin-env.sh. >>> >>> Regards, >>> Sourav >>> >>> On Sun, Oct 11, 2015 at 2:55 AM, moon soo Lee <m...@apache.org> wrote: >>> >>>> Could you make sure your zeppelin-env.sh have SPARK_HOME exported? >>>> >>>> Zeppelin(0.6.0-SNAPSHOT) uses spark-submit command when SPARK_HOME is >>>> defined, but your error shows that "please use spark-submit". >>>> >>>> Thanks, >>>> moon >>>> On 2015년 10월 8일 (목) at 오후 9:14 Sourav Mazumder < >>>> sourav.mazumde...@gmail.com> wrote: >>>> >>>>> Hi Deepak/Moon, >>>>> >>>>> After seeing the stack trace of the error and the code >>>>> org.apache.zeppelin.spark.SparkInterpreter.java I think this is surely a >>>>> bug in Spark Interpreter code. >>>>> >>>>> The SparkInterpreter code is always calling the constructor of >>>>> org.apache.spark.SparkContext to create a new Spark Context whenever the >>>>> SparkInterpreter class is loaded by >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer. And hence >>>>> this error. >>>>> >>>>> I'm not sure whether the check for yarn-cluster is newly added in >>>>> SparkContext. >>>>> >>>>> Attaching here the complete stack trace for your ease of reference. >>>>> >>>>> Regards, >>>>> Sourav >>>>> >>>>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't >>>>> running on a cluster. Deployment to YARN is not supported directly by >>>>> SparkContext. Please use spark-submit. at >>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:378) at >>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339) >>>>> at >>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149) >>>>> at >>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465) >>>>> at >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) >>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at >>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) >>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at >>>>> java.util.concurrent.FutureTask.run(Unknown Source) at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown >>>>> Source) at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown >>>>> Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown >>>>> Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown >>>>> Source) at java.lang.Thread.run(Unknown Source) >>>>> >>>>> On Mon, Oct 5, 2015 at 12:57 PM, Sourav Mazumder < >>>>> sourav.mazumde...@gmail.com> wrote: >>>>> >>>>>> I could execute following without any issue. >>>>>> >>>>>> spark-submit --class org.apache.spark.examples.SparkPi --master >>>>>> yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory >>>>>> 512m >>>>>> --executor-cores 1 lib/spark-examples.jar 10 >>>>>> >>>>>> Regards, >>>>>> Sourav >>>>>> >>>>>> On Mon, Oct 5, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> did you try a test job with yarn-cluster (outside zeppelin) ? >>>>>>> >>>>>>> On Mon, Oct 5, 2015 at 11:48 AM, Sourav Mazumder < >>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>> >>>>>>>> Yes I have them setup appropriately. >>>>>>>> >>>>>>>> Where I am lost is I can see that interpreter is running >>>>>>>> spark-submit but at some point of time it is switching to creating a >>>>>>>> spark >>>>>>>> context. >>>>>>>> >>>>>>>> May be, as you rightly mentioned, because of some permission issue >>>>>>>> it is not able to run driver on yarn cluster. But what is that >>>>>>>> issue/required configuration I'm not able to figure out. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Sourav >>>>>>>> >>>>>>>> On Mon, Oct 5, 2015 at 11:38 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Do you have these settings configured in zeppelin-env.sh >>>>>>>>> >>>>>>>>> export JAVA_HOME=/usr/src/jdk1.7.0_79/ >>>>>>>>> >>>>>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>>>>>> >>>>>>>>> Most likely you have this as your able to run with yarn-client. >>>>>>>>> >>>>>>>>> >>>>>>>>> Looks like the issue is to not be able to run the driver program >>>>>>>>> on cluster. >>>>>>>>> >>>>>>>>> On Mon, Oct 5, 2015 at 11:13 AM, Sourav Mazumder < >>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Yes. Spark is installed in the machine where zeppelin is running. >>>>>>>>>> >>>>>>>>>> The location of spark.yarn.jar is very similar to what you have. >>>>>>>>>> I'm using IOP as distribution and it is the directory naming >>>>>>>>>> convention >>>>>>>>>> specific to IOP which is different form hdp. >>>>>>>>>> >>>>>>>>>> And yes the setup works perfectly fine when I use master as >>>>>>>>>> yarn-client and same setup for SPARK_HOME, HADOOP_CONF_DIR and >>>>>>>>>> HADOOP_CLIENT> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Sourav >>>>>>>>>> >>>>>>>>>> On Mon, Oct 5, 2015 at 10:25 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) < >>>>>>>>>> deepuj...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Is spark installed on your zeppelin machine ? >>>>>>>>>>> >>>>>>>>>>> I would to try these >>>>>>>>>>> >>>>>>>>>>> master yarn-client >>>>>>>>>>> spark.home === SPARK INSTALLATION HOME directory on your >>>>>>>>>>> zeppelin server. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Looking at spark.yarn.jar , i see spark is installed at >>>>>>>>>>> /usr/iop/current/spark-thriftserver/ . But why is it >>>>>>>>>>> thirftserver (i do not know what is it). >>>>>>>>>>> >>>>>>>>>>> I have spark installed (unzip) on zeppelin machine at >>>>>>>>>>> /usr/hdp/2.3.1.0-2574/spark/spark/ >>>>>>>>>>> (can be any location) and have spark.yarn.jar to >>>>>>>>>>> /usr/hdp/2.3.1.0-2574/spark/spark/lib/spark-assembly-1.4.1-hadoop2.6.0.jar. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 5, 2015 at 10:20 AM, Sourav Mazumder < >>>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Deepu, >>>>>>>>>>>> >>>>>>>>>>>> Here u go. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Sourav >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Properties* name value args master yarn-cluster spark.app.name >>>>>>>>>>>> Zeppelin >>>>>>>>>>>> spark.cores.max spark.executor.memory 512m spark.home >>>>>>>>>>>> spark.yarn.jar >>>>>>>>>>>> /usr/iop/current/spark-thriftserver/lib/spark-assembly.jar >>>>>>>>>>>> zeppelin.dep.localrepo >>>>>>>>>>>> local-repo zeppelin.pyspark.python python >>>>>>>>>>>> zeppelin.spark.concurrentSQL >>>>>>>>>>>> false zeppelin.spark.maxResult 1000 zeppelin.spark.useHiveContext >>>>>>>>>>>> true >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 5, 2015 at 10:05 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) < >>>>>>>>>>>> deepuj...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Can you share screen shot of your spark interpreter on >>>>>>>>>>>>> zeppelin web interface. >>>>>>>>>>>>> >>>>>>>>>>>>> I have exact same deployment structure and it runs fine with >>>>>>>>>>>>> right set of configurations. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 5, 2015 at 7:56 AM, Sourav Mazumder < >>>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Moon, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using 0.6 SNAPSHOT which I built from latest git hub. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried setting SPARK_HOME in zeppelin-env.sh. Also I could >>>>>>>>>>>>>> see that the control goes to the appropriate IF-ELSE block in >>>>>>>>>>>>>> interpreter.sh by putting some debug statement. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But I get the same error as follows - >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.spark.SparkException: Detected yarn-cluster mode, >>>>>>>>>>>>>> but isn't running on a cluster. Deployment to YARN is not >>>>>>>>>>>>>> supported >>>>>>>>>>>>>> directly by SparkContext. Please use spark-submit. at >>>>>>>>>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:378) at >>>>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) >>>>>>>>>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at >>>>>>>>>>>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>>>>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at >>>>>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let me know if you need any other details to figure out what >>>>>>>>>>>>>> is going on. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Sourav >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Sep 30, 2015 at 1:53 AM, moon soo Lee < >>>>>>>>>>>>>> m...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Which version of Zeppelin are you using? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Master branch uses spark-submit command, when SPARK_HOME is >>>>>>>>>>>>>>> defined in conf/zeppelin-env.sh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you're not on master branch, recommend try it with >>>>>>>>>>>>>>> SPARK_HOME defined. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope this helps, >>>>>>>>>>>>>>> moon >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Sep 23, 2015 at 10:21 PM Sourav Mazumder < >>>>>>>>>>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When I try to run Spark Interpreter in Yarn Cluster mode >>>>>>>>>>>>>>>> from a remote machine I always get the error saying try >>>>>>>>>>>>>>>> spark-submit than >>>>>>>>>>>>>>>> using spark context. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mu Zeppelin process runs in a separate machine remote to >>>>>>>>>>>>>>>> the YARN cluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any idea why is this error ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Sourav >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Deepak >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Deepak >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Deepak >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Deepak >>>>>>> >>>>>>> >>>>>> >>>>> >>>