Re: Why spark-submit works with package not with jar

ayan guha Tue, 20 Oct 2020 11:56:20 -0700

Hi

One way to think of this is --packages is better when you have third party
dependency and --jars is better when you have custom in-house built jars.


On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Sean and Russell. Much appreciated.
>
> Just to clarify recently I had issues with different versions of Google
> Guava jar files in building Uber jar file (to evict the unwanted ones).
> These used to work a year and half ago using Google Dataproc compute
> engines (comes with Spark preloaded) and I could create an Uber jar file.
>
> Unfortunately this has become problematic now so tried to use spark-submit
> instead as follows:
>
> ${SPARK_HOME}/bin/spark-submit \
>                 --master yarn \
>                 --deploy-mode client \
>                 --conf spark.executor.memoryOverhead=3000 \
>                 --class org.apache.spark.repl.Main \
>                 --name "Spark shell on Yarn" "$@"
>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>                        /home/hduser/jars/ddhybrid.jar \
>                 --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>
> Effectively tailored spark-shell. However, I do not think there is a
> mechanism to resolve jar conflicts without  building an Uber jar file
> through SBT?
>
> Cheers
>
>
>
> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> --jar Adds only that jar
>> --package adds the Jar and a it's dependencies listed in maven
>>
>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a scenario that I use in Spark submit as follows:
>>>
>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>>>
>>> As you can see the jar files needed are added.
>>>
>>>
>>> This comes back with error message as below
>>>
>>>
>>> Creating model test.weights_MODEL
>>>
>>> java.lang.NoClassDefFoundError:
>>> com/google/api/client/http/HttpRequestInitializer
>>>
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>>
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>>
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>>
>>>   ... 76 elided
>>>
>>> Caused by: java.lang.ClassNotFoundException:
>>> com.google.api.client.http.HttpRequestInitializer
>>>
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>
>>>
>>>
>>> So there is an issue with finding the class, although the jar file used
>>>
>>>
>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>>>
>>> has it.
>>>
>>>
>>> Now if *I remove the above jar file and replace it with the same
>>> version but package* it works!
>>>
>>>
>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>>
>>>
>>> I have read the write-ups about packages searching the maven
>>> libraries etc. Not convinced why using the package should make so much
>>> difference between a failure and success. In other words, when to use a
>>> package rather than a jar.
>>>
>>>
>>> Any ideas will be appreciated.
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>> --
Best Regards,
Ayan Guha

Re: Why spark-submit works with package not with jar

Reply via email to