Does v0.9 support yarn-cluster mode? I checked SparkContext.scala in v0.9.1 and didn't see special handling of `yarn-cluster`. -Xiangrui
On Mon, May 12, 2014 at 11:14 AM, DB Tsai <dbt...@stanford.edu> wrote: > We're deploying Spark in yarn-cluster mode (Spark 0.9), and we add jar > dependencies in command line with "--addJars" option. However, those > external jars are only available in the driver (application running in > hadoop), and not available in the executors (workers). > > After doing some research, we realize that we've to push those jars to > executors in driver via sc.AddJar(fileName). Although in the driver's log > (see the following), the jar is successfully added in the http server in the > driver, and I confirm that it's downloadable from any machine in the > network, I still get `java.lang.NoClassDefFoundError` in the executors. > > 14/05/09 14:51:41 INFO spark.SparkContext: Added JAR > analyticshadoop-eba5cdce1.jar at > http://10.0.0.56:42522/jars/analyticshadoop-eba5cdce1.jar with timestamp > 1399672301568 > > Then I check the log in the executors, and I don't find anything `Fetching > <file> with timestamp <timestamp>`, which implies something is wrong; the > executors are not downloading the external jars. > > Any suggestion what we can look at? > > After digging into how spark distributes external jars, I wonder the > scalability of this approach. What if there are thousands of nodes > downloading the jar from single http server in the driver? Why don't we push > the jars into HDFS distributed cache by default instead of distributing them > via http server? > > Thanks. > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai