Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

Mich Talebzadeh Wed, 07 Apr 2021 02:11:39 -0700

Hi Gabor et. al.,

To be honest I am not convinced this package --packages
org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working!


I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works fine. I
reported the package working before because under $SPARK_HOME/jars on all
nodes there was a copy 3.0.1 jar file. Also in $SPARK_HOME/conf we had the
following entries:

spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jar
spark.driver.extraClassPath        $SPARK_HOME/jars/*.jar
spark.executor.extraClassPath      $SPARK_HOME/jars/*.jar

So the jar file was picked up first anyway.

The concern I have is that that the package uses older version of jar
files, namely: the following in .ivy2/jars

-rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14
com.github.luben_zstd-jni-1.4.8-1.jar
-rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019
org.apache.commons_commons-pool2-2.6.2.jar
-rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020
org.apache.kafka_kafka-clients-2.6.0.jar
-rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57
org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
-rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58
org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
-rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020 org.lz4_lz4-java-1.7.1.jar
-rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019
org.slf4j_slf4j-api-1.7.30.jar
-rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014
org.spark-project.spark_unused-1.0.0.jar
-rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10
org.xerial.snappy_snappy-java-1.1.8.2.jar


So I am not sure. Hence I want someone to verify this independently in anger

Cheers



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 7 Apr 2021 at 07:51, Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Good to hear it's working.
> Happy Spark usage.
>
> G
>
>
> On Tue, 6 Apr 2021, 21:56 Mich Talebzadeh, <mich.talebza...@gmail.com>
> wrote:
>
>> OK we found out the root cause of this issue.
>>
>> We were writing to Redis from Spark and downloaded a recently compiled
>> version of Redis jar with scala 2.12.
>>
>> spark-redis_2.12-2.4.1-SNAPSHOT-jar-with-dependencies.jar
>>
>> It was giving grief. We removed that one. So the job runs with either
>>
>> spark-sql-kafka-0-10_2.12-*3.1.0*.jar
>>
>> or as packages through
>>
>> spark-submit .. --packages org.apache.spark:spark-sql-kafka-0-10_2.12:
>> *3.1.1*
>>
>> We will follow the suggested solution as per doc
>>
>>
>> Batch: 18
>>
>> -------------------------------------------
>>
>> +--------------------+------+-------------------+------+
>>
>> |              rowkey|ticker|         timeissued| price|
>>
>> +--------------------+------+-------------------+------+
>>
>> |b539cb54-3ddd-47c...|  ORCL|2021-04-06 20:53:37| 41.32|
>>
>> |2d4bae2d-649e-4b8...|   VOD|2021-04-06 20:53:37|317.48|
>>
>> |2f51f188-6da4-4bb...|   MKS|2021-04-06 20:53:37|376.63|
>>
>> |1a4c4645-8dc7-4ef...|    BP|2021-04-06 20:53:37| 571.5|
>>
>> |45c9e738-ead7-4e5...|  SBRY|2021-04-06 20:53:37|244.76|
>>
>> |48f93c13-43ad-422...|   SAP|2021-04-06 20:53:37| 58.71|
>>
>> |ed4d89b1-7fc1-420...|   IBM|2021-04-06 20:53:37|105.91|
>>
>> |44b3f0ce-27b8-4a9...|   MRW|2021-04-06 20:53:37|297.85|
>>
>> |4441b0b5-32c1-4cb...|  MSFT|2021-04-06 20:53:37| 27.83|
>>
>> |143398a4-13b5-494...|  TSCO|2021-04-06 20:53:37|183.42|
>>
>> +--------------------+------+-------------------+------+
>>
>> Now we need to go back to the drawing board and see how to integrate
>> Redis
>>
>>
>>
>> Thanks
>>
>>
>> Mich
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 6 Apr 2021 at 17:52, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Fine.
>>>
>>> Just to clarify please.
>>>
>>> With SBT assembly and Scala I would create an Uber jar file and used
>>> that one with spark-submit
>>>
>>> As I understand (and stand corrected) with PySpark one can only run
>>> spark-submit in client mode by directly using a py file?
>>>
>>> So hence
>>>
>>> spark-submit --master local[4] --packages
>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 <python_file>
>>>
>>>
>>>
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 6 Apr 2021 at 17:39, Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> Gabor's point is that these are not libraries you typically install in
>>>> your cluster itself. You package them with your app.
>>>>
>>>> On Tue, Apr 6, 2021 at 11:35 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi G
>>>>>
>>>>> Thanks for the heads-up.
>>>>>
>>>>> In a thread on 3rd of March I reported that 3.1.1 works in yarn mode
>>>>>
>>>>> Spark 3.1.1 Preliminary results (mainly to do with Spark Structured
>>>>> Streaming) (mail-archive.com)
>>>>> <https://www.mail-archive.com/user@spark.apache.org/msg75979.html>
>>>>>
>>>>> From that mail
>>>>>
>>>>>
>>>>> The needed jar files for version 3.1.1 to read from Kafka and write to
>>>>> BigQuery for 3.1.1 are as follows:
>>>>>
>>>>> All under $SPARK_HOME/jars on all nodes. These are the latest available 
>>>>> jar
>>>>> files
>>>>>
>>>>>
>>>>>    - commons-pool2-2.9.0.jar
>>>>>    - spark-token-provider-kafka-0-10_2.12-3.1.0.jar
>>>>>    - spark-sql-kafka-0-10_2.12-3.1.0.jar
>>>>>    - kafka-clients-2.7.0.jar
>>>>>    - spark-bigquery-latest_2.12.jar
>>>>>
>>>>>
>>>>>
>>>>> I just tested it and in local mode single JVM it works fine without
>>>>> the addition of package --> --packages
>>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1
>>>>>  BUT including all the above jars files
>>>>>
>>>>> Batch: 17
>>>>> -------------------------------------------
>>>>> +--------------------+------+-------------------+------+
>>>>> |              rowkey|ticker|         timeissued| price|
>>>>> +--------------------+------+-------------------+------+
>>>>> |54651f0d-1be0-4d7...|   IBM|2021-04-06 17:17:04| 91.92|
>>>>> |8aa1ad79-4792-466...|   SAP|2021-04-06 17:17:04| 34.93|
>>>>> |8567f327-cfec-43d...|  TSCO|2021-04-06 17:17:04| 324.5|
>>>>> |138a1278-2f54-45b...|   VOD|2021-04-06 17:17:04| 241.4|
>>>>> |e02793c3-8e78-47e...|  ORCL|2021-04-06 17:17:04|  17.6|
>>>>> |0ab456fb-bd22-465...|  SBRY|2021-04-06 17:17:04|350.45|
>>>>> |74588e92-a3e2-48c...|  MSFT|2021-04-06 17:17:04| 44.58|
>>>>> |1e7203c6-6938-4ea...|    BP|2021-04-06 17:17:04| 588.0|
>>>>> |1e55021a-148d-4aa...|   MRW|2021-04-06 17:17:04|171.21|
>>>>> |229ad6f9-e4ed-475...|   MKS|2021-04-06 17:17:04|439.17|
>>>>> +--------------------+------+-------------------+------+
>>>>>
>>>>> However, if I exclude the jar file spark-sql-kafka-0-10_2.12-3.1.0.jar
>>>>> and include the packages as suggested in the link
>>>>>
>>>>>
>>>>> spark-submit --master local[4] --conf
>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>> --conf
>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>> --conf
>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3 
>>>>> *--packages
>>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1* xyz.py
>>>>>
>>>>> It cannot fetch the data
>>>>>
>>>>> root
>>>>>  |-- parsed_value: struct (nullable = true)
>>>>>  |    |-- rowkey: string (nullable = true)
>>>>>  |    |-- ticker: string (nullable = true)
>>>>>  |    |-- timeissued: timestamp (nullable = true)
>>>>>  |    |-- price: float (nullable = true)
>>>>>
>>>>> {'message': 'Initializing sources', 'isDataAvailable': False,
>>>>> 'isTriggerActive': False}
>>>>> -------------------------------------------
>>>>> Batch: 0
>>>>> -------------------------------------------
>>>>> +------+------+----------+-----+
>>>>> |rowkey|ticker|timeissued|price|
>>>>> +------+------+----------+-----+
>>>>> +------+------+----------+-----+
>>>>>
>>>>> 2021-04-06 17:20:11,492 ERROR util.Utils: Aborting task
>>>>> java.lang.NoSuchMethodError:
>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>         at
>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:549)
>>>>>         at
>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:291)
>>>>>         at
>>>>> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>>>>>         at
>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:604)
>>>>>         at
>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.get(KafkaDataConsumer.scala:287)
>>>>>         at
>>>>> org.apache.spark.sql.kafka010.KafkaBatchPartitionReader.next(KafkaBatchPartitionReader.scala:63)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
>>>>>         at
>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>>>>>         at
>>>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>>>>>         at
>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>>>>> Source)
>>>>>         at
>>>>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>>>>         at
>>>>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
>>>>>         at
>>>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:413)
>>>>>         at
>>>>> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
>>>>>         at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:131)
>>>>>         at
>>>>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>>>>>         at
>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>>>>>         at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>         at java.lang.Thread.run(Thread.java:748)
>>>>> 2021-04-06 17:20:11,492 ERROR util.Utils: Aborting task
>>>>> java.lang.NoSuchMethodError:
>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>
>>>>>
>>>>> Now I deleted ~/.ivy2 directory and ran the job again
>>>>>
>>>>> Ivy Default Cache set to: /home/hduser/.ivy2/cache
>>>>> The jars for the packages stored in: /home/hduser/.ivy2/jars
>>>>> org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
>>>>> :: resolving dependencies ::
>>>>> org.apache.spark#spark-submit-parent-2bab6bd2-3136-4783-b044-810f0800ef0e;1.0
>>>>>
>>>>> let us go and have a look at the directory .ivy2/jars
>>>>>
>>>>>  /home/hduser/.ivy2/jars> ltr
>>>>> total 13108
>>>>> -rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014
>>>>> org.spark-project.spark_unused-1.0.0.jar
>>>>> -rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019
>>>>> org.apache.commons_commons-pool2-2.6.2.jar
>>>>> -rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019
>>>>> org.slf4j_slf4j-api-1.7.30.jar
>>>>> -rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020
>>>>> org.lz4_lz4-java-1.7.1.jar
>>>>> -rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020
>>>>> org.apache.kafka_kafka-clients-2.6.0.jar
>>>>> -rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10
>>>>> org.xerial.snappy_snappy-java-1.1.8.2.jar
>>>>> -rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14
>>>>> com.github.luben_zstd-jni-1.4.8-1.jar
>>>>> -rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57
>>>>> org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>> -rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58
>>>>> org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
>>>>> drwxr-xr-x 4 hduser hadoop    4096 Apr  6 17:25 ..
>>>>> drwxr-xr-x 2 hduser hadoop    4096 Apr  6 17:25 .
>>>>>
>>>>> Strangely these jar files
>>>>> like org.apache.kafka_kafka-clients-2.6.0.jar
>>>>> and org.apache.commons_commons-pool2-2.6.2.jar seem to be out of date.
>>>>>
>>>>> Very confusing. Sounds like we have changed something in the cluster
>>>>> that as reported on 3rd March it  used to work with those jar files and 
>>>>> now
>>>>> not working.
>>>>>
>>>>> So in summary *without those jar files added to $SPARK_HOME/jars i*t
>>>>> fails totally even with the packages added.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 6 Apr 2021 at 15:44, Gabor Somogyi <gabor.g.somo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > Anyway I unzipped the tarball for Spark-3.1.1 and there is
>>>>>> no spark-sql-kafka-0-10_2.12-3.0.1.jar even
>>>>>>
>>>>>> Please see how Structured Streaming app with Kafka needs to be
>>>>>> deployed here:
>>>>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#deploying
>>>>>> I don't see the --packages option...
>>>>>>
>>>>>> G
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 6, 2021 at 2:40 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> OK thanks for that.
>>>>>>>
>>>>>>> I am using spark-submit with PySpark as follows
>>>>>>>
>>>>>>>  spark-submit --version
>>>>>>> Welcome to
>>>>>>>       ____              __
>>>>>>>      / __/__  ___ _____/ /__
>>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>>>>>>>       /_/
>>>>>>>
>>>>>>> Using Scala version 2.12.9, Java HotSpot(TM) 64-Bit Server VM,
>>>>>>> 1.8.0_201
>>>>>>> Branch HEAD
>>>>>>> Compiled by user ubuntu on 2021-02-22T01:33:19Z
>>>>>>>
>>>>>>>
>>>>>>> spark-submit --master yarn --deploy-mode client --conf
>>>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>>>> --conf
>>>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>>>> --conf
>>>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3
>>>>>>> --driver-memory 16G --executor-memory 8G --num-executors 4 
>>>>>>> --executor-cores
>>>>>>> 2 xyz.py
>>>>>>>
>>>>>>> enabling with virtual environment
>>>>>>>
>>>>>>>
>>>>>>> That works fine with any job that does not do structured streaming
>>>>>>> in a client mode.
>>>>>>>
>>>>>>>
>>>>>>> Running on local  node with
>>>>>>>
>>>>>>>
>>>>>>> spark-submit --master local[4] --conf
>>>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>>>> --conf
>>>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>>>> --conf
>>>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3
>>>>>>> xyz.py
>>>>>>>
>>>>>>>
>>>>>>> works fine with the same spark version and $SPARK_HOME/jars
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 6 Apr 2021 at 13:20, Sean Owen <sro...@gmail.com> wrote:
>>>>>>>
>>>>>>>> You may be compiling your app against 3.0.1 JARs but submitting to
>>>>>>>> 3.1.1.
>>>>>>>> You do not in general modify the Spark libs. You need to package
>>>>>>>> libs like this with your app at the correct version.
>>>>>>>>
>>>>>>>> On Tue, Apr 6, 2021 at 6:42 AM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Gabor.
>>>>>>>>>
>>>>>>>>> All nodes are running Spark /spark-3.1.1-bin-hadoop3.2
>>>>>>>>>
>>>>>>>>> So $SPARK_HOME/jars contains all the required jars on all nodes
>>>>>>>>> including the jar file commons-pool2-2.9.0.jar as well.
>>>>>>>>>
>>>>>>>>> They are installed identically on all nodes.
>>>>>>>>>
>>>>>>>>> I have looked at the Spark environment for classpath. Still I
>>>>>>>>> don't see the reason why Spark 3.1.1 fails with
>>>>>>>>> spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>>>>>> but works ok with  spark-sql-kafka-0-10_2.12-3.1.0.jar
>>>>>>>>>
>>>>>>>>> Anyway I unzipped the tarball for Spark-3.1.1 and there is
>>>>>>>>> no spark-sql-kafka-0-10_2.12-3.0.1.jar even
>>>>>>>>>
>>>>>>>>> I had to add spark-sql-kafka-0-10_2.12-3.0.1.jar to make it work.
>>>>>>>>> Then I enquired the availability of new version from Maven that 
>>>>>>>>> pointed to
>>>>>>>>> *spark-sql-kafka-0-10_2.12-3.1.1.jar*
>>>>>>>>>
>>>>>>>>> So to confirm Spark out of the tarball does not have any
>>>>>>>>>
>>>>>>>>> ltr spark-sql-kafka-*
>>>>>>>>> ls: cannot access spark-sql-kafka-*: No such file or directory
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For SSS, I had to add these
>>>>>>>>>
>>>>>>>>> add commons-pool2-2.9.0.jar. The one shipped is
>>>>>>>>>  commons-pool-1.5.4.jar!
>>>>>>>>>
>>>>>>>>> add kafka-clients-2.7.0.jar  Did not have any
>>>>>>>>>
>>>>>>>>> add  spark-sql-kafka-0-10_2.12-3.0.1.jar  Did not have any
>>>>>>>>>
>>>>>>>>> I gather from your second mail, there seems to be an issue with
>>>>>>>>> spark-sql-kafka-0-10_2.12-3.*1*.1.jar ?
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>>> which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>>> damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 6 Apr 2021 at 11:54, Gabor Somogyi <
>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Since you've not shared too much details I presume you've updated
>>>>>>>>>> the spark-sql-kafka jar only.
>>>>>>>>>> KafkaTokenUtil is in the token provider jar.
>>>>>>>>>>
>>>>>>>>>> As a general note if I'm right, please update Spark as a whole on
>>>>>>>>>> all nodes and not just jars independently.
>>>>>>>>>>
>>>>>>>>>> BR,
>>>>>>>>>> G
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 6, 2021 at 10:21 AM Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any chance of someone testing  the latest 
>>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>>>>>>>> for Spark. It throws
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> java.lang.NoSuchMethodError:
>>>>>>>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However, the previous version
>>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.0.1.jar works fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>> other
>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>> content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

Reply via email to