
One way to think of this is --packages is better when you have third party
dependency and --jars is better when you have custom in-house built jars.

On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <mich.talebza...@gmail.com>

> Thanks Sean and Russell. Much appreciated.
> Just to clarify recently I had issues with different versions of Google
> Guava jar files in building Uber jar file (to evict the unwanted ones).
> These used to work a year and half ago using Google Dataproc compute
> engines (comes with Spark preloaded) and I could create an Uber jar file.
> Unfortunately this has become problematic now so tried to use spark-submit
> instead as follows:
> ${SPARK_HOME}/bin/spark-submit \
>                 --master yarn \
>                 --deploy-mode client \
>                 --conf spark.executor.memoryOverhead=3000 \
>                 --class org.apache.spark.repl.Main \
>                 --name "Spark shell on Yarn" "$@"
>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>                        /home/hduser/jars/ddhybrid.jar \
>                 --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
> Effectively tailored spark-shell. However, I do not think there is a
> mechanism to resolve jar conflicts without  building an Uber jar file
> through SBT?
> Cheers
> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>> --jar Adds only that jar
>> --package adds the Jar and a it's dependencies listed in maven
>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>> Hi,
>>> I have a scenario that I use in Spark submit as follows:
>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>>> As you can see the jar files needed are added.
>>> This comes back with error message as below
>>> Creating model test.weights_MODEL
>>> java.lang.NoClassDefFoundError:
>>> com/google/api/client/http/HttpRequestInitializer
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>>   ... 76 elided
>>> Caused by: java.lang.ClassNotFoundException:
>>> com.google.api.client.http.HttpRequestInitializer
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> So there is an issue with finding the class, although the jar file used
>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>>> has it.
>>> Now if *I remove the above jar file and replace it with the same
>>> version but package* it works!
>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>> I have read the write-ups about packages searching the maven
>>> libraries etc. Not convinced why using the package should make so much
>>> difference between a failure and success. In other words, when to use a
>>> package rather than a jar.
>>> Any ideas will be appreciated.
>>> Thanks
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>> --
Best Regards,
Ayan Guha

Reply via email to