Hi Mich,

If I correctly understood your problem, it is that the spark-kafka jar is
shadowed by the installed kafka client jar at run time.
I had been in that place earlier.
I can recommend resolving the issue using the shade plugin. The example I
am pasting here works for pom.xml.
I am very sure you will find something for sbt as well.
This is a maven shade plugin to change the name of the class while
packaging. This will form an uber jar.
<*relocations*>
    <*relocation*>
        <*pattern*>org.apache.kafka</*pattern*>
        <*shadedPattern*>shade.org.apache.kafka</*shadedPattern*>
    </*relocation*>
</*relocations*>

Hope this helps.

Regards
Amit Joshi

On Wed, Apr 7, 2021 at 8:14 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Did some tests. The concern is SSS job running under YARN
>
>
> *Scenario 1)*  use spark-sql-kafka-0-10_2.12-3.1.0.jar
>
>    - Removed spark-sql-kafka-0-10_2.12-3.1.0.jar from anywhere on
>    CLASSPATH including $SPARK_HOME/jars
>    - Added the said jar file to spark-submit in client mode (the only
>    mode available to PySpark) with --jars
>    - spark-submit --master yarn --deploy-mode client --conf
>    spark.pyspark.virtualenv.enabled=true .. bla bla..  --driver-memory 4G
>    --executor-memory 4G --num-executors 2 --executor-cores 2 *--jars
>    $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.0.jar *xyz.py
>
> This works fine
>
>
> *Scenario 2)* use spark-sql-kafka-0-10_2.12-3.1.1.jar in spark-submit
>
>
>
>    -  spark-submit --master yarn --deploy-mode client --conf
>    spark.pyspark.virtualenv.enabled=true ..bla bla.. --driver-memory 4G
>    --executor-memory 4G --num-executors 2 --executor-cores 2 *--jars
>    $HOME/jars/spark-sql-kafka-0-10_2.12-*3.1.1*.jar *xyz.py
>
> it failed with
>
>
>
>    - Caused by: java.lang.NoSuchMethodError:
>    
> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>
> Scenario 3) use the package as per Structured Streaming + Kafka
> Integration Guide (Kafka broker version 0.10.0 or higher) - Spark 3.1.1
> Documentation (apache.org)
> <https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#deploying>
>
>
>    - spark-submit --master yarn --deploy-mode client --conf
>    spark.pyspark.virtualenv.enabled=true ..bla bla.. --driver-memory 4G
>    --executor-memory 4G --num-executors 2 --executor-cores 2 *--packages
>    org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 *xyz.py
>
> it failed with
>
>    - Caused by: java.lang.NoSuchMethodError:
>    
> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 7 Apr 2021 at 13:20, Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> +1 on Sean's opinion
>>
>> On Wed, Apr 7, 2021 at 2:17 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> You shouldn't be modifying your cluster install. You may at this point
>>> have conflicting, excess JARs in there somewhere. I'd start it over if you
>>> can.
>>>
>>> On Wed, Apr 7, 2021 at 7:15 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>>> wrote:
>>>
>>>> Not sure what you mean not working. You've added 3.1.1 to packages
>>>> which uses:
>>>> * 2.6.0 kafka-clients:
>>>> https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L136
>>>> * 2.6.2 commons pool:
>>>> https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L183
>>>>
>>>> I think it worth an end-to-end dep-tree analysis what is really
>>>> happening on the cluster...
>>>>
>>>> G
>>>>
>>>>
>>>> On Wed, Apr 7, 2021 at 11:11 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi Gabor et. al.,
>>>>>
>>>>> To be honest I am not convinced this package --packages
>>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working!
>>>>>
>>>>> I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works
>>>>> fine. I reported the package working before because under $SPARK_HOME/jars
>>>>> on all nodes there was a copy 3.0.1 jar file. Also in $SPARK_HOME/conf we
>>>>> had the following entries:
>>>>>
>>>>> spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jar
>>>>> spark.driver.extraClassPath        $SPARK_HOME/jars/*.jar
>>>>> spark.executor.extraClassPath      $SPARK_HOME/jars/*.jar
>>>>>
>>>>> So the jar file was picked up first anyway.
>>>>>
>>>>> The concern I have is that that the package uses older version of jar
>>>>> files, namely: the following in .ivy2/jars
>>>>>
>>>>> -rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14
>>>>> com.github.luben_zstd-jni-1.4.8-1.jar
>>>>> -rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019
>>>>> org.apache.commons_commons-pool2-2.6.2.jar
>>>>> -rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020
>>>>> org.apache.kafka_kafka-clients-2.6.0.jar
>>>>> -rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57
>>>>> org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>> -rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58
>>>>> org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
>>>>> -rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020
>>>>> org.lz4_lz4-java-1.7.1.jar
>>>>> -rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019
>>>>> org.slf4j_slf4j-api-1.7.30.jar
>>>>> -rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014
>>>>> org.spark-project.spark_unused-1.0.0.jar
>>>>> -rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10
>>>>> org.xerial.snappy_snappy-java-1.1.8.2.jar
>>>>>
>>>>>
>>>>> So I am not sure. Hence I want someone to verify this independently in
>>>>> anger
>>>>>
>>>>>

Reply via email to