Hi One way to think of this is --packages is better when you have third party dependency and --jars is better when you have custom in-house built jars.
On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Sean and Russell. Much appreciated. > > Just to clarify recently I had issues with different versions of Google > Guava jar files in building Uber jar file (to evict the unwanted ones). > These used to work a year and half ago using Google Dataproc compute > engines (comes with Spark preloaded) and I could create an Uber jar file. > > Unfortunately this has become problematic now so tried to use spark-submit > instead as follows: > > ${SPARK_HOME}/bin/spark-submit \ > --master yarn \ > --deploy-mode client \ > --conf spark.executor.memoryOverhead=3000 \ > --class org.apache.spark.repl.Main \ > --name "Spark shell on Yarn" "$@" > --driver-class-path /home/hduser/jars/ddhybrid.jar \ > --jars /home/hduser/jars/spark-bigquery-latest.jar, \ > /home/hduser/jars/ddhybrid.jar \ > --packages com.github.samelamin:spark-bigquery_2.11:0.2.6 > > Effectively tailored spark-shell. However, I do not think there is a > mechanism to resolve jar conflicts without building an Uber jar file > through SBT? > > Cheers > > > > On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> --jar Adds only that jar >> --package adds the Jar and a it's dependencies listed in maven >> >> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi, >>> >>> I have a scenario that I use in Spark submit as follows: >>> >>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars >>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, >>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* >>> >>> As you can see the jar files needed are added. >>> >>> >>> This comes back with error message as below >>> >>> >>> Creating model test.weights_MODEL >>> >>> java.lang.NoClassDefFoundError: >>> com/google/api/client/http/HttpRequestInitializer >>> >>> at >>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19) >>> >>> at >>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19) >>> >>> at >>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105) >>> >>> ... 76 elided >>> >>> Caused by: java.lang.ClassNotFoundException: >>> com.google.api.client.http.HttpRequestInitializer >>> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >>> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> >>> >>> >>> So there is an issue with finding the class, although the jar file used >>> >>> >>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar >>> >>> has it. >>> >>> >>> Now if *I remove the above jar file and replace it with the same >>> version but package* it works! >>> >>> >>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars >>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar >>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6* >>> >>> >>> I have read the write-ups about packages searching the maven >>> libraries etc. Not convinced why using the package should make so much >>> difference between a failure and success. In other words, when to use a >>> package rather than a jar. >>> >>> >>> Any ideas will be appreciated. >>> >>> >>> Thanks >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >> -- Best Regards, Ayan Guha