Re: Distribute jar dependencies via sc.AddJar(fileName)

Xiangrui Meng Wed, 14 May 2014 00:04:22 -0700

Does v0.9 support yarn-cluster mode? I checked SparkContext.scala in
v0.9.1 and didn't see special handling of `yarn-cluster`. -Xiangrui


On Mon, May 12, 2014 at 11:14 AM, DB Tsai <dbt...@stanford.edu> wrote:
> We're deploying Spark in yarn-cluster mode (Spark 0.9), and we add jar
> dependencies in command line with "--addJars" option. However, those
> external jars are only available in the driver (application running in
> hadoop), and not available in the executors (workers).
>
> After doing some research, we realize that we've to push those jars to
> executors in driver via sc.AddJar(fileName). Although in the driver's log
> (see the following), the jar is successfully added in the http server in the
> driver, and I confirm that it's downloadable from any machine in the
> network, I still get `java.lang.NoClassDefFoundError` in the executors.
>
> 14/05/09 14:51:41 INFO spark.SparkContext: Added JAR
> analyticshadoop-eba5cdce1.jar at
> http://10.0.0.56:42522/jars/analyticshadoop-eba5cdce1.jar with timestamp
> 1399672301568
>
> Then I check the log in the executors, and I don't find anything `Fetching
> <file> with timestamp <timestamp>`, which implies something is wrong; the
> executors are not downloading the external jars.
>
> Any suggestion what we can look at?
>
> After digging into how spark distributes external jars, I wonder the
> scalability of this approach. What if there are thousands of nodes
> downloading the jar from single http server in the driver? Why don't we push
> the jars into HDFS distributed cache by default instead of distributing them
> via http server?
>
> Thanks.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Distribute jar dependencies via sc.AddJar(fileName)

Reply via email to