Re: Why spark-submit works with package not with jar

Mich Talebzadeh Tue, 20 Oct 2020 14:48:29 -0700

or just use mvn or sbt to create an Uber jar file.




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 20 Oct 2020 at 22:43, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks again all.
>
> Hi Sean,
>
> As I understood from your statement, you are suggesting just use
> --packages without worrying about individual jar dependencies?
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 20 Oct 2020 at 22:34, Sean Owen <sro...@gmail.com> wrote:
>
>> From the looks of it, it's the com.google.http-client ones. But there may
>> be more. You should not have to reason about this. That's why you let Maven
>> / Ivy resolution figure it out. It is not true that everything in .ivy2 is
>> on the classpath.
>>
>> On Tue, Oct 20, 2020 at 3:48 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Nicolas,
>>>
>>> I removed ~/.iv2 and reran the spark job with the package included (the
>>> one working)
>>>
>>> Under ~/.ivy/jars I Have 37 jar files, including the one that I had
>>> before.
>>>
>>> /home/hduser/.ivy2/jars> ls
>>> com.databricks_spark-avro_2.11-4.0.0.jar
>>>  com.google.cloud.bigdataoss_gcs-connector-1.9.4-hadoop2.jar
>>> com.google.oauth-client_google-oauth-client-1.24.1.jar
>>> org.checkerframework_checker-qual-2.5.2.jar
>>> com.fasterxml.jackson.core_jackson-core-2.9.2.jar
>>> com.google.cloud.bigdataoss_gcsio-1.9.4.jar
>>> com.google.oauth-client_google-oauth-client-java6-1.24.1.jar
>>> org.codehaus.jackson_jackson-core-asl-1.9.13.jar
>>> com.github.samelamin_spark-bigquery_2.11-0.2.6.jar
>>>  com.google.cloud.bigdataoss_util-1.9.4.jar
>>>  commons-codec_commons-codec-1.6.jar
>>>  org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
>>> com.google.api-client_google-api-client-1.24.1.jar
>>>  com.google.cloud.bigdataoss_util-hadoop-1.9.4-hadoop2.jar
>>> commons-logging_commons-logging-1.1.1.jar
>>>  org.codehaus.mojo_animal-sniffer-annotations-1.14.jar
>>> com.google.api-client_google-api-client-jackson2-1.24.1.jar
>>> com.google.code.findbugs_jsr305-3.0.2.jar
>>> com.thoughtworks.paranamer_paranamer-2.3.jar
>>> org.slf4j_slf4j-api-1.7.5.jar
>>> com.google.api-client_google-api-client-java6-1.24.1.jar
>>>  com.google.errorprone_error_prone_annotations-2.1.3.jar
>>> joda-time_joda-time-2.9.3.jar
>>>  org.tukaani_xz-1.0.jar
>>> com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar
>>> com.google.guava_guava-26.0-jre.jar
>>> org.apache.avro_avro-1.7.6.jar
>>> org.xerial.snappy_snappy-java-1.0.5.jar
>>> com.google.apis_google-api-services-storage-v1-rev135-1.24.1.jar
>>>  com.google.http-client_google-http-client-1.24.1.jar
>>>  org.apache.commons_commons-compress-1.4.1.jar
>>> com.google.auto.value_auto-value-annotations-1.6.2.jar
>>>  com.google.http-client_google-http-client-jackson2-1.24.1.jar
>>> org.apache.httpcomponents_httpclient-4.0.1.jar
>>> com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar
>>> com.google.j2objc_j2objc-annotations-1.1.jar
>>>  org.apache.httpcomponents_httpcore-4.0.1.jar
>>>
>>> I don't think I need to add all of these to spark-submit --jars list. Is
>>> there a way I can find out which dependency is missing
>>>
>>> This is the error I am getting when I use the jar file
>>> * com.github.samelamin_spark-bigquery_2.11-0.2.6.jar* instead of the
>>> package *com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>>
>>> java.lang.NoClassDefFoundError:
>>> com/google/api/client/http/HttpRequestInitializer
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>>   at
>>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>>   ... 76 elided
>>> Caused by: java.lang.ClassNotFoundException:
>>> com.google.api.client.http.HttpRequestInitializer
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 20 Oct 2020 at 20:09, Nicolas Paris <nicolas.pa...@riseup.net>
>>> wrote:
>>>
>>>> once you got the jars from --package in the ~/.ivy2 folder you can then
>>>> add the list to --jars . in this way there is no missing dependency.
>>>>
>>>>
>>>> ayan guha <guha.a...@gmail.com> writes:
>>>>
>>>> > Hi
>>>> >
>>>> > One way to think of this is --packages is better when you have third
>>>> party
>>>> > dependency and --jars is better when you have custom in-house built
>>>> jars.
>>>> >
>>>> > On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com>
>>>> > wrote:
>>>> >
>>>> >> Thanks Sean and Russell. Much appreciated.
>>>> >>
>>>> >> Just to clarify recently I had issues with different versions of
>>>> Google
>>>> >> Guava jar files in building Uber jar file (to evict the unwanted
>>>> ones).
>>>> >> These used to work a year and half ago using Google Dataproc compute
>>>> >> engines (comes with Spark preloaded) and I could create an Uber jar
>>>> file.
>>>> >>
>>>> >> Unfortunately this has become problematic now so tried to use
>>>> spark-submit
>>>> >> instead as follows:
>>>> >>
>>>> >> ${SPARK_HOME}/bin/spark-submit \
>>>> >>                 --master yarn \
>>>> >>                 --deploy-mode client \
>>>> >>                 --conf spark.executor.memoryOverhead=3000 \
>>>> >>                 --class org.apache.spark.repl.Main \
>>>> >>                 --name "Spark shell on Yarn" "$@"
>>>> >>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>>>> >>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>>>> >>                        /home/hduser/jars/ddhybrid.jar \
>>>> >>                 --packages
>>>> com.github.samelamin:spark-bigquery_2.11:0.2.6
>>>> >>
>>>> >> Effectively tailored spark-shell. However, I do not think there is a
>>>> >> mechanism to resolve jar conflicts without  building an Uber jar file
>>>> >> through SBT?
>>>> >>
>>>> >> Cheers
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <
>>>> russell.spit...@gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >>> --jar Adds only that jar
>>>> >>> --package adds the Jar and a it's dependencies listed in maven
>>>> >>>
>>>> >>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
>>>> >>> mich.talebza...@gmail.com> wrote:
>>>> >>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I have a scenario that I use in Spark submit as follows:
>>>> >>>>
>>>> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar
>>>> --jars
>>>> >>>>
>>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>>>> >>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>>>> >>>>
>>>> >>>> As you can see the jar files needed are added.
>>>> >>>>
>>>> >>>>
>>>> >>>> This comes back with error message as below
>>>> >>>>
>>>> >>>>
>>>> >>>> Creating model test.weights_MODEL
>>>> >>>>
>>>> >>>> java.lang.NoClassDefFoundError:
>>>> >>>> com/google/api/client/http/HttpRequestInitializer
>>>> >>>>
>>>> >>>>   at
>>>> >>>>
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>>> >>>>
>>>> >>>>   at
>>>> >>>>
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>>> >>>>
>>>> >>>>   at
>>>> >>>>
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>>> >>>>
>>>> >>>>   ... 76 elided
>>>> >>>>
>>>> >>>> Caused by: java.lang.ClassNotFoundException:
>>>> >>>> com.google.api.client.http.HttpRequestInitializer
>>>> >>>>
>>>> >>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>> >>>>
>>>> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> >>>>
>>>> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> So there is an issue with finding the class, although the jar file
>>>> used
>>>> >>>>
>>>> >>>>
>>>> >>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>>>> >>>>
>>>> >>>> has it.
>>>> >>>>
>>>> >>>>
>>>> >>>> Now if *I remove the above jar file and replace it with the same
>>>> >>>> version but package* it works!
>>>> >>>>
>>>> >>>>
>>>> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar
>>>> --jars
>>>> >>>>
>>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>>>> >>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>>> >>>>
>>>> >>>>
>>>> >>>> I have read the write-ups about packages searching the maven
>>>> >>>> libraries etc. Not convinced why using the package should make so
>>>> much
>>>> >>>> difference between a failure and success. In other words, when to
>>>> use a
>>>> >>>> package rather than a jar.
>>>> >>>>
>>>> >>>>
>>>> >>>> Any ideas will be appreciated.
>>>> >>>>
>>>> >>>>
>>>> >>>> Thanks
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>> for
>>>> >>>> any loss, damage or destruction of data or any other property
>>>> which may
>>>> >>>> arise from relying on this email's technical content is explicitly
>>>> >>>> disclaimed. The author will in no case be liable for any monetary
>>>> damages
>>>> >>>> arising from such loss, damage or destruction.
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>> --
>>>> > Best Regards,
>>>> > Ayan Guha
>>>>
>>>>
>>>> --
>>>> nicolas paris
>>>>
>>>

Re: Why spark-submit works with package not with jar

Reply via email to