Next try. I copied whole dist directory created by make-distribution script 
to cluster not just assembly jar. Then I used

./bin/spark-submit --num-executors 200 --master yarn-cluster --class org.
apache.spark.mllib.CreateGuidDomainDictionary ../spark/root-0.1.jar ${args}

 ...to run app again. Startup scripts printed this message:

"Spark assembly has been built with Hive, including Datanucleus jars on 
classpath"

  ...so I thought I am finally there. But job started and failed on the same
ClassNotFound exception as before. Is "classpath" from script message just 
classpath of driver? Or is it the same classpath which is affected by --jars
option? I was trying to find out from scripts but I was not able to find 
where --jars option is processed.

  thanks


---------- Původní zpráva ----------
Od: Michael Armbrust <mich...@databricks.com>
Komu: spark.dubovsky.ja...@seznam.cz
Datum: 6. 12. 2014 20:39:13
Předmět: Re: Including data nucleus tools

"



On Sat, Dec 6, 2014 at 5:53 AM, <spark.dubovsky.ja...@seznam.cz
(mailto:/skin/default/img/empty.gif)> wrote:"
Bonus question: Should the class org.datanucleus.api.jdo.
JDOPersistenceManagerFactory be part of assembly? Because it is not in jar 
now.

"



No these jars cannot be put into the assembly because they have extra 
metadata files that live in the same location (so if you put them all in an 
assembly they overrwrite each other).  This metadata is used in discovery.  
Instead they must be manually put on the classpath in their original form 
(usually using --jars). 



 
"

Reply via email to