Thanks Sean and Russell. Much appreciated. Just to clarify recently I had issues with different versions of Google Guava jar files in building Uber jar file (to evict the unwanted ones). These used to work a year and half ago using Google Dataproc compute engines (comes with Spark preloaded) and I could create an Uber jar file.
Unfortunately this has become problematic now so tried to use spark-submit instead as follows: ${SPARK_HOME}/bin/spark-submit \ --master yarn \ --deploy-mode client \ --conf spark.executor.memoryOverhead=3000 \ --class org.apache.spark.repl.Main \ --name "Spark shell on Yarn" "$@" --driver-class-path /home/hduser/jars/ddhybrid.jar \ --jars /home/hduser/jars/spark-bigquery-latest.jar, \ /home/hduser/jars/ddhybrid.jar \ --packages com.github.samelamin:spark-bigquery_2.11:0.2.6 Effectively tailored spark-shell. However, I do not think there is a mechanism to resolve jar conflicts without building an Uber jar file through SBT? Cheers On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com> wrote: > --jar Adds only that jar > --package adds the Jar and a it's dependencies listed in maven > > On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi, >> >> I have a scenario that I use in Spark submit as follows: >> >> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars >> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, >> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* >> >> As you can see the jar files needed are added. >> >> >> This comes back with error message as below >> >> >> Creating model test.weights_MODEL >> >> java.lang.NoClassDefFoundError: >> com/google/api/client/http/HttpRequestInitializer >> >> at >> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19) >> >> at >> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19) >> >> at >> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105) >> >> ... 76 elided >> >> Caused by: java.lang.ClassNotFoundException: >> com.google.api.client.http.HttpRequestInitializer >> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> >> >> >> So there is an issue with finding the class, although the jar file used >> >> >> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar >> >> has it. >> >> >> Now if *I remove the above jar file and replace it with the same version >> but package* it works! >> >> >> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars >> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar >> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6* >> >> >> I have read the write-ups about packages searching the maven >> libraries etc. Not convinced why using the package should make so much >> difference between a failure and success. In other words, when to use a >> package rather than a jar. >> >> >> Any ideas will be appreciated. >> >> >> Thanks >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >