Next try. I copied whole dist directory created by make-distribution script to cluster not just assembly jar. Then I used
./bin/spark-submit --num-executors 200 --master yarn-cluster --class org. apache.spark.mllib.CreateGuidDomainDictionary ../spark/root-0.1.jar ${args} ...to run app again. Startup scripts printed this message: "Spark assembly has been built with Hive, including Datanucleus jars on classpath" ...so I thought I am finally there. But job started and failed on the same ClassNotFound exception as before. Is "classpath" from script message just classpath of driver? Or is it the same classpath which is affected by --jars option? I was trying to find out from scripts but I was not able to find where --jars option is processed. thanks ---------- Původní zpráva ---------- Od: Michael Armbrust <mich...@databricks.com> Komu: spark.dubovsky.ja...@seznam.cz Datum: 6. 12. 2014 20:39:13 Předmět: Re: Including data nucleus tools " On Sat, Dec 6, 2014 at 5:53 AM, <spark.dubovsky.ja...@seznam.cz (mailto:/skin/default/img/empty.gif)> wrote:" Bonus question: Should the class org.datanucleus.api.jdo. JDOPersistenceManagerFactory be part of assembly? Because it is not in jar now. " No these jars cannot be put into the assembly because they have extra metadata files that live in the same location (so if you put them all in an assembly they overrwrite each other). This metadata is used in discovery. Instead they must be manually put on the classpath in their original form (usually using --jars). "