Re: Why spark-submit works with package not with jar

Sean Owen Tue, 20 Oct 2020 14:35:19 -0700

>From the looks of it, it's the com.google.http-client ones. But there may
be more. You should not have to reason about this. That's why you let Maven
/ Ivy resolution figure it out. It is not true that everything in .ivy2 is
on the classpath.


On Tue, Oct 20, 2020 at 3:48 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi Nicolas,
>
> I removed ~/.iv2 and reran the spark job with the package included (the
> one working)
>
> Under ~/.ivy/jars I Have 37 jar files, including the one that I had
> before.
>
> /home/hduser/.ivy2/jars> ls
> com.databricks_spark-avro_2.11-4.0.0.jar
>  com.google.cloud.bigdataoss_gcs-connector-1.9.4-hadoop2.jar
> com.google.oauth-client_google-oauth-client-1.24.1.jar
> org.checkerframework_checker-qual-2.5.2.jar
> com.fasterxml.jackson.core_jackson-core-2.9.2.jar
> com.google.cloud.bigdataoss_gcsio-1.9.4.jar
> com.google.oauth-client_google-oauth-client-java6-1.24.1.jar
> org.codehaus.jackson_jackson-core-asl-1.9.13.jar
> com.github.samelamin_spark-bigquery_2.11-0.2.6.jar
>  com.google.cloud.bigdataoss_util-1.9.4.jar
>  commons-codec_commons-codec-1.6.jar
>  org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
> com.google.api-client_google-api-client-1.24.1.jar
>  com.google.cloud.bigdataoss_util-hadoop-1.9.4-hadoop2.jar
> commons-logging_commons-logging-1.1.1.jar
>  org.codehaus.mojo_animal-sniffer-annotations-1.14.jar
> com.google.api-client_google-api-client-jackson2-1.24.1.jar
> com.google.code.findbugs_jsr305-3.0.2.jar
> com.thoughtworks.paranamer_paranamer-2.3.jar
> org.slf4j_slf4j-api-1.7.5.jar
> com.google.api-client_google-api-client-java6-1.24.1.jar
>  com.google.errorprone_error_prone_annotations-2.1.3.jar
> joda-time_joda-time-2.9.3.jar
>  org.tukaani_xz-1.0.jar
> com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar
> com.google.guava_guava-26.0-jre.jar
> org.apache.avro_avro-1.7.6.jar
> org.xerial.snappy_snappy-java-1.0.5.jar
> com.google.apis_google-api-services-storage-v1-rev135-1.24.1.jar
>  com.google.http-client_google-http-client-1.24.1.jar
>  org.apache.commons_commons-compress-1.4.1.jar
> com.google.auto.value_auto-value-annotations-1.6.2.jar
>  com.google.http-client_google-http-client-jackson2-1.24.1.jar
> org.apache.httpcomponents_httpclient-4.0.1.jar
> com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar
> com.google.j2objc_j2objc-annotations-1.1.jar
>  org.apache.httpcomponents_httpcore-4.0.1.jar
>
> I don't think I need to add all of these to spark-submit --jars list. Is
> there a way I can find out which dependency is missing
>
> This is the error I am getting when I use the jar file
> * com.github.samelamin_spark-bigquery_2.11-0.2.6.jar* instead of the
> package *com.github.samelamin:spark-bigquery_2.11:0.2.6*
>
> java.lang.NoClassDefFoundError:
> com/google/api/client/http/HttpRequestInitializer
>   at
> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>   at
> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>   at
> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>   ... 76 elided
> Caused by: java.lang.ClassNotFoundException:
> com.google.api.client.http.HttpRequestInitializer
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
>
> Thanks
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 20 Oct 2020 at 20:09, Nicolas Paris <nicolas.pa...@riseup.net>
> wrote:
>
>> once you got the jars from --package in the ~/.ivy2 folder you can then
>> add the list to --jars . in this way there is no missing dependency.
>>
>>
>> ayan guha <guha.a...@gmail.com> writes:
>>
>> > Hi
>> >
>> > One way to think of this is --packages is better when you have third
>> party
>> > dependency and --jars is better when you have custom in-house built
>> jars.
>> >
>> > On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
>> > wrote:
>> >
>> >> Thanks Sean and Russell. Much appreciated.
>> >>
>> >> Just to clarify recently I had issues with different versions of Google
>> >> Guava jar files in building Uber jar file (to evict the unwanted ones).
>> >> These used to work a year and half ago using Google Dataproc compute
>> >> engines (comes with Spark preloaded) and I could create an Uber jar
>> file.
>> >>
>> >> Unfortunately this has become problematic now so tried to use
>> spark-submit
>> >> instead as follows:
>> >>
>> >> ${SPARK_HOME}/bin/spark-submit \
>> >>                 --master yarn \
>> >>                 --deploy-mode client \
>> >>                 --conf spark.executor.memoryOverhead=3000 \
>> >>                 --class org.apache.spark.repl.Main \
>> >>                 --name "Spark shell on Yarn" "$@"
>> >>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>> >>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>> >>                        /home/hduser/jars/ddhybrid.jar \
>> >>                 --packages
>> com.github.samelamin:spark-bigquery_2.11:0.2.6
>> >>
>> >> Effectively tailored spark-shell. However, I do not think there is a
>> >> mechanism to resolve jar conflicts without  building an Uber jar file
>> >> through SBT?
>> >>
>> >> Cheers
>> >>
>> >>
>> >>
>> >> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <
>> russell.spit...@gmail.com>
>> >> wrote:
>> >>
>> >>> --jar Adds only that jar
>> >>> --package adds the Jar and a it's dependencies listed in maven
>> >>>
>> >>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
>> >>> mich.talebza...@gmail.com> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I have a scenario that I use in Spark submit as follows:
>> >>>>
>> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar
>> --jars
>> >>>>
>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>> >>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>> >>>>
>> >>>> As you can see the jar files needed are added.
>> >>>>
>> >>>>
>> >>>> This comes back with error message as below
>> >>>>
>> >>>>
>> >>>> Creating model test.weights_MODEL
>> >>>>
>> >>>> java.lang.NoClassDefFoundError:
>> >>>> com/google/api/client/http/HttpRequestInitializer
>> >>>>
>> >>>>   at
>> >>>>
>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>> >>>>
>> >>>>   at
>> >>>>
>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>> >>>>
>> >>>>   at
>> >>>>
>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>> >>>>
>> >>>>   ... 76 elided
>> >>>>
>> >>>> Caused by: java.lang.ClassNotFoundException:
>> >>>> com.google.api.client.http.HttpRequestInitializer
>> >>>>
>> >>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>> >>>>
>> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>>>
>> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>>>
>> >>>>
>> >>>>
>> >>>> So there is an issue with finding the class, although the jar file
>> used
>> >>>>
>> >>>>
>> >>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>> >>>>
>> >>>> has it.
>> >>>>
>> >>>>
>> >>>> Now if *I remove the above jar file and replace it with the same
>> >>>> version but package* it works!
>> >>>>
>> >>>>
>> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar
>> --jars
>> >>>>
>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>> >>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>> >>>>
>> >>>>
>> >>>> I have read the write-ups about packages searching the maven
>> >>>> libraries etc. Not convinced why using the package should make so
>> much
>> >>>> difference between a failure and success. In other words, when to
>> use a
>> >>>> package rather than a jar.
>> >>>>
>> >>>>
>> >>>> Any ideas will be appreciated.
>> >>>>
>> >>>>
>> >>>> Thanks
>> >>>>
>> >>>>
>> >>>>
>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> >>>> any loss, damage or destruction of data or any other property which
>> may
>> >>>> arise from relying on this email's technical content is explicitly
>> >>>> disclaimed. The author will in no case be liable for any monetary
>> damages
>> >>>> arising from such loss, damage or destruction.
>> >>>>
>> >>>>
>> >>>>
>> >>> --
>> > Best Regards,
>> > Ayan Guha
>>
>>
>> --
>> nicolas paris
>>
>

Re: Why spark-submit works with package not with jar

Reply via email to