Master node in your xml can only have yarn-cluster or local[x] where x is
the number of executor no ?

--
Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini
fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
<http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>

2015-10-06 9:25 GMT+02:00 Nitin Kumar <[email protected]>:

> Hi,
>
> I am running a 3 node cluster (HDP 2.3, installed using ambari 2.1.1).
> I have been trying to run a spark job that runs a word count program using
> the spark action.
>
> It program runs fine when master is set to local but runs into errors when
> set to yarn-cluster or yarn-client.
>
> My workflow is as follows
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <workflow-app xmlns='uri:oozie:workflow:0.4' name='sparkjob'>
>     <start to='spark-process' />
>     <action name='spark-process'>
>         <spark xmlns='uri:oozie:spark-action:0.1'>
>         <job-tracker>${jobTracker}</job-tracker>
>         <name-node>${nameNode}</name-node>
>         <configuration>
>             <property>
>
> <name>oozie.service.SparkConfigurationService.spark.configurations</name>
>
>
> <value>spark.eventLog.dir=hdfs://node1.analytics.subex:8020/user/spark/applicationHistory,spark.yarn.historyServer.address=
> http://node1.analytics.subex:18088,spark.eventLog.enabled=true</value>
>             </property>
>             <!--property>
>                 <name>oozie.use.system.libpath</name>
>                 <value>true</value>
>             </property>
>             <property>
>
> <name>oozie.service.WorkflowAppService.system.libpath</name>
>
> <value>/user/oozie/share/lib/lib_20150831190253/spark</value>
>             </property-->
>         </configuration>
>         <master>yarn-client</master>
>         <mode>client</mode>
>         <name>Word Count</name>
>         <jar>/usr/hdp/current/spark-client/AnalyticsJar/wordcount.py</jar>
>         <spark-opts>--executor-memory 1G --driver-memory 1G
> --executor-cores 4 --num-executors 2 --jars
> /usr/hdp/current/spark-client/lib/spark-assembly-1.3.1.2.3.0.0-2557
> -hadoop2.7.1.2.3.0.0-2557.jar</spark-opts>
>         </spark>
>         <ok to='end'/>
>         <error to='spark-fail'/>
>     </action>
>     <kill name='spark-fail'>
>         <message>Spark job failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>
>     <end name='end' />
> </workflow-app>
>
>
> I get the following error:
>
> Traceback (most recent call last):
>   File "/usr/hdp/current/spark-client/AnalyticsJar/wordcount.py", line
> 26, in <module>
>     sc = SparkContext(conf=conf)
>   File
> "/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
> line 107, in __init__
>   File
> "/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
> line 155, in _do_init
>   File
> "/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
> line 201, in _initialize_context
>   File
> "/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/py4j/java_gateway.py",
> line 701, in __call__
>   File
> "/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> None.org.apache.spark.api.java.JavaSparkContext.
> : org.apache.spark.SparkException: YARN mode not available ?
>         at
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1586)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:310)
>         at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>         at
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:264)
>         at
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1580)
>         ... 13 more
>
>
>
> The steps I have taken
>
> 1. Copied the jars in spark-client/lib directory to
> /user/oozie/share/lib/spark followed by a restart of the spark service
> 2. Passed the assembly jar within <spark-opts></spark-opts> (see workflow)
> 3. Tried setting oozie.service.WorkflowAppService.system.libpath to the
> jars in the share lib directory
>
>
> It seems that spark is not getting the right jars for deploying the job in
> yarn even though I have tried to make the jars available to the workflow.
> While scanning through the detailed logs, I have also noticed that the
> assembly jar is present in the yarn application folder and also present in
> oozie classpath.
>
> Is there some configuration that I'm missing? Would appreciate any help.
>
>
> Regards,
> Nitin
>

Reply via email to