Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

Gabor Somogyi Wed, 07 Apr 2021 05:16:00 -0700

Not sure what you mean not working. You've added 3.1.1 to packages which
uses:
* 2.6.0 kafka-clients:
https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L136
* 2.6.2 commons pool:
https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L183


I think it worth an end-to-end dep-tree analysis what is really happening
on the cluster...

G


On Wed, Apr 7, 2021 at 11:11 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi Gabor et. al.,
>
> To be honest I am not convinced this package --packages
> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working!
>
> I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works fine. I
> reported the package working before because under $SPARK_HOME/jars on all
> nodes there was a copy 3.0.1 jar file. Also in $SPARK_HOME/conf we had the
> following entries:
>
> spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jar
> spark.driver.extraClassPath        $SPARK_HOME/jars/*.jar
> spark.executor.extraClassPath      $SPARK_HOME/jars/*.jar
>
> So the jar file was picked up first anyway.
>
> The concern I have is that that the package uses older version of jar
> files, namely: the following in .ivy2/jars
>
> -rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14
> com.github.luben_zstd-jni-1.4.8-1.jar
> -rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019
> org.apache.commons_commons-pool2-2.6.2.jar
> -rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020
> org.apache.kafka_kafka-clients-2.6.0.jar
> -rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57
> org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
> -rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58
> org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
> -rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020 org.lz4_lz4-java-1.7.1.jar
> -rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019
> org.slf4j_slf4j-api-1.7.30.jar
> -rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014
> org.spark-project.spark_unused-1.0.0.jar
> -rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10
> org.xerial.snappy_snappy-java-1.1.8.2.jar
>
>
> So I am not sure. Hence I want someone to verify this independently in
> anger
>
> Cheers
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 7 Apr 2021 at 07:51, Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> Good to hear it's working.
>> Happy Spark usage.
>>
>> G
>>
>>
>> On Tue, 6 Apr 2021, 21:56 Mich Talebzadeh, <mich.talebza...@gmail.com>
>> wrote:
>>
>>> OK we found out the root cause of this issue.
>>>
>>> We were writing to Redis from Spark and downloaded a recently compiled
>>> version of Redis jar with scala 2.12.
>>>
>>> spark-redis_2.12-2.4.1-SNAPSHOT-jar-with-dependencies.jar
>>>
>>> It was giving grief. We removed that one. So the job runs with either
>>>
>>> spark-sql-kafka-0-10_2.12-*3.1.0*.jar
>>>
>>> or as packages through
>>>
>>> spark-submit .. --packages org.apache.spark:spark-sql-kafka-0-10_2.12:
>>> *3.1.1*
>>>
>>> We will follow the suggested solution as per doc
>>>
>>>
>>> Batch: 18
>>>
>>> -------------------------------------------
>>>
>>> +--------------------+------+-------------------+------+
>>>
>>> |              rowkey|ticker|         timeissued| price|
>>>
>>> +--------------------+------+-------------------+------+
>>>
>>> |b539cb54-3ddd-47c...|  ORCL|2021-04-06 20:53:37| 41.32|
>>>
>>> |2d4bae2d-649e-4b8...|   VOD|2021-04-06 20:53:37|317.48|
>>>
>>> |2f51f188-6da4-4bb...|   MKS|2021-04-06 20:53:37|376.63|
>>>
>>> |1a4c4645-8dc7-4ef...|    BP|2021-04-06 20:53:37| 571.5|
>>>
>>> |45c9e738-ead7-4e5...|  SBRY|2021-04-06 20:53:37|244.76|
>>>
>>> |48f93c13-43ad-422...|   SAP|2021-04-06 20:53:37| 58.71|
>>>
>>> |ed4d89b1-7fc1-420...|   IBM|2021-04-06 20:53:37|105.91|
>>>
>>> |44b3f0ce-27b8-4a9...|   MRW|2021-04-06 20:53:37|297.85|
>>>
>>> |4441b0b5-32c1-4cb...|  MSFT|2021-04-06 20:53:37| 27.83|
>>>
>>> |143398a4-13b5-494...|  TSCO|2021-04-06 20:53:37|183.42|
>>>
>>> +--------------------+------+-------------------+------+
>>>
>>> Now we need to go back to the drawing board and see how to integrate
>>> Redis
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>> Mich
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 6 Apr 2021 at 17:52, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Fine.
>>>>
>>>> Just to clarify please.
>>>>
>>>> With SBT assembly and Scala I would create an Uber jar file and used
>>>> that one with spark-submit
>>>>
>>>> As I understand (and stand corrected) with PySpark one can only run
>>>> spark-submit in client mode by directly using a py file?
>>>>
>>>> So hence
>>>>
>>>> spark-submit --master local[4] --packages
>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 <python_file>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 6 Apr 2021 at 17:39, Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> Gabor's point is that these are not libraries you typically install in
>>>>> your cluster itself. You package them with your app.
>>>>>
>>>>> On Tue, Apr 6, 2021 at 11:35 AM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Hi G
>>>>>>
>>>>>> Thanks for the heads-up.
>>>>>>
>>>>>> In a thread on 3rd of March I reported that 3.1.1 works in yarn mode
>>>>>>
>>>>>> Spark 3.1.1 Preliminary results (mainly to do with Spark Structured
>>>>>> Streaming) (mail-archive.com)
>>>>>> <https://www.mail-archive.com/user@spark.apache.org/msg75979.html>
>>>>>>
>>>>>> From that mail
>>>>>>
>>>>>>
>>>>>> The needed jar files for version 3.1.1 to read from Kafka and write to
>>>>>> BigQuery for 3.1.1 are as follows:
>>>>>>
>>>>>> All under $SPARK_HOME/jars on all nodes. These are the latest available 
>>>>>> jar
>>>>>> files
>>>>>>
>>>>>>
>>>>>>    - commons-pool2-2.9.0.jar
>>>>>>    - spark-token-provider-kafka-0-10_2.12-3.1.0.jar
>>>>>>    - spark-sql-kafka-0-10_2.12-3.1.0.jar
>>>>>>    - kafka-clients-2.7.0.jar
>>>>>>    - spark-bigquery-latest_2.12.jar
>>>>>>
>>>>>>
>>>>>>
>>>>>> I just tested it and in local mode single JVM it works fine without
>>>>>> the addition of package --> --packages
>>>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1
>>>>>>  BUT including all the above jars files
>>>>>>
>>>>>> Batch: 17
>>>>>> -------------------------------------------
>>>>>> +--------------------+------+-------------------+------+
>>>>>> |              rowkey|ticker|         timeissued| price|
>>>>>> +--------------------+------+-------------------+------+
>>>>>> |54651f0d-1be0-4d7...|   IBM|2021-04-06 17:17:04| 91.92|
>>>>>> |8aa1ad79-4792-466...|   SAP|2021-04-06 17:17:04| 34.93|
>>>>>> |8567f327-cfec-43d...|  TSCO|2021-04-06 17:17:04| 324.5|
>>>>>> |138a1278-2f54-45b...|   VOD|2021-04-06 17:17:04| 241.4|
>>>>>> |e02793c3-8e78-47e...|  ORCL|2021-04-06 17:17:04|  17.6|
>>>>>> |0ab456fb-bd22-465...|  SBRY|2021-04-06 17:17:04|350.45|
>>>>>> |74588e92-a3e2-48c...|  MSFT|2021-04-06 17:17:04| 44.58|
>>>>>> |1e7203c6-6938-4ea...|    BP|2021-04-06 17:17:04| 588.0|
>>>>>> |1e55021a-148d-4aa...|   MRW|2021-04-06 17:17:04|171.21|
>>>>>> |229ad6f9-e4ed-475...|   MKS|2021-04-06 17:17:04|439.17|
>>>>>> +--------------------+------+-------------------+------+
>>>>>>
>>>>>> However, if I exclude the jar
>>>>>> file spark-sql-kafka-0-10_2.12-3.1.0.jar and include the packages as
>>>>>> suggested in the link
>>>>>>
>>>>>>
>>>>>> spark-submit --master local[4] --conf
>>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>>> --conf
>>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>>> --conf
>>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3
>>>>>>  *--packages
>>>>>> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1* xyz.py
>>>>>>
>>>>>> It cannot fetch the data
>>>>>>
>>>>>> root
>>>>>>  |-- parsed_value: struct (nullable = true)
>>>>>>  |    |-- rowkey: string (nullable = true)
>>>>>>  |    |-- ticker: string (nullable = true)
>>>>>>  |    |-- timeissued: timestamp (nullable = true)
>>>>>>  |    |-- price: float (nullable = true)
>>>>>>
>>>>>> {'message': 'Initializing sources', 'isDataAvailable': False,
>>>>>> 'isTriggerActive': False}
>>>>>> -------------------------------------------
>>>>>> Batch: 0
>>>>>> -------------------------------------------
>>>>>> +------+------+----------+-----+
>>>>>> |rowkey|ticker|timeissued|price|
>>>>>> +------+------+----------+-----+
>>>>>> +------+------+----------+-----+
>>>>>>
>>>>>> 2021-04-06 17:20:11,492 ERROR util.Utils: Aborting task
>>>>>> java.lang.NoSuchMethodError:
>>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>>         at
>>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:549)
>>>>>>         at
>>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:291)
>>>>>>         at
>>>>>> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>>>>>>         at
>>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:604)
>>>>>>         at
>>>>>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer.get(KafkaDataConsumer.scala:287)
>>>>>>         at
>>>>>> org.apache.spark.sql.kafka010.KafkaBatchPartitionReader.next(KafkaBatchPartitionReader.scala:63)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
>>>>>>         at
>>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>>>>>>         at
>>>>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>>>>>>         at
>>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>>>>>> Source)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
>>>>>>         at
>>>>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:413)
>>>>>>         at
>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
>>>>>>         at
>>>>>> org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
>>>>>>         at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:131)
>>>>>>         at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>>>>>>         at
>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>>>>>>         at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
>>>>>>         at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>         at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>         at java.lang.Thread.run(Thread.java:748)
>>>>>> 2021-04-06 17:20:11,492 ERROR util.Utils: Aborting task
>>>>>> java.lang.NoSuchMethodError:
>>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>>
>>>>>>
>>>>>> Now I deleted ~/.ivy2 directory and ran the job again
>>>>>>
>>>>>> Ivy Default Cache set to: /home/hduser/.ivy2/cache
>>>>>> The jars for the packages stored in: /home/hduser/.ivy2/jars
>>>>>> org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
>>>>>> :: resolving dependencies ::
>>>>>> org.apache.spark#spark-submit-parent-2bab6bd2-3136-4783-b044-810f0800ef0e;1.0
>>>>>>
>>>>>> let us go and have a look at the directory .ivy2/jars
>>>>>>
>>>>>>  /home/hduser/.ivy2/jars> ltr
>>>>>> total 13108
>>>>>> -rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014
>>>>>> org.spark-project.spark_unused-1.0.0.jar
>>>>>> -rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019
>>>>>> org.apache.commons_commons-pool2-2.6.2.jar
>>>>>> -rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019
>>>>>> org.slf4j_slf4j-api-1.7.30.jar
>>>>>> -rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020
>>>>>> org.lz4_lz4-java-1.7.1.jar
>>>>>> -rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020
>>>>>> org.apache.kafka_kafka-clients-2.6.0.jar
>>>>>> -rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10
>>>>>> org.xerial.snappy_snappy-java-1.1.8.2.jar
>>>>>> -rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14
>>>>>> com.github.luben_zstd-jni-1.4.8-1.jar
>>>>>> -rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57
>>>>>> org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>>> -rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58
>>>>>> org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
>>>>>> drwxr-xr-x 4 hduser hadoop    4096 Apr  6 17:25 ..
>>>>>> drwxr-xr-x 2 hduser hadoop    4096 Apr  6 17:25 .
>>>>>>
>>>>>> Strangely these jar files
>>>>>> like org.apache.kafka_kafka-clients-2.6.0.jar
>>>>>> and org.apache.commons_commons-pool2-2.6.2.jar seem to be out of date.
>>>>>>
>>>>>> Very confusing. Sounds like we have changed something in the cluster
>>>>>> that as reported on 3rd March it  used to work with those jar files and 
>>>>>> now
>>>>>> not working.
>>>>>>
>>>>>> So in summary *without those jar files added to $SPARK_HOME/jars i*t
>>>>>> fails totally even with the packages added.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 6 Apr 2021 at 15:44, Gabor Somogyi <gabor.g.somo...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > Anyway I unzipped the tarball for Spark-3.1.1 and there is
>>>>>>> no spark-sql-kafka-0-10_2.12-3.0.1.jar even
>>>>>>>
>>>>>>> Please see how Structured Streaming app with Kafka needs to be
>>>>>>> deployed here:
>>>>>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#deploying
>>>>>>> I don't see the --packages option...
>>>>>>>
>>>>>>> G
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 6, 2021 at 2:40 PM Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>> OK thanks for that.
>>>>>>>>
>>>>>>>> I am using spark-submit with PySpark as follows
>>>>>>>>
>>>>>>>>  spark-submit --version
>>>>>>>> Welcome to
>>>>>>>>       ____              __
>>>>>>>>      / __/__  ___ _____/ /__
>>>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>>>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>>>>>>>>       /_/
>>>>>>>>
>>>>>>>> Using Scala version 2.12.9, Java HotSpot(TM) 64-Bit Server VM,
>>>>>>>> 1.8.0_201
>>>>>>>> Branch HEAD
>>>>>>>> Compiled by user ubuntu on 2021-02-22T01:33:19Z
>>>>>>>>
>>>>>>>>
>>>>>>>> spark-submit --master yarn --deploy-mode client --conf
>>>>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>>>>> --conf
>>>>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>>>>> --conf
>>>>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3
>>>>>>>> --driver-memory 16G --executor-memory 8G --num-executors 4 
>>>>>>>> --executor-cores
>>>>>>>> 2 xyz.py
>>>>>>>>
>>>>>>>> enabling with virtual environment
>>>>>>>>
>>>>>>>>
>>>>>>>> That works fine with any job that does not do structured streaming
>>>>>>>> in a client mode.
>>>>>>>>
>>>>>>>>
>>>>>>>> Running on local  node with
>>>>>>>>
>>>>>>>>
>>>>>>>> spark-submit --master local[4] --conf
>>>>>>>> spark.pyspark.virtualenv.enabled=true --conf
>>>>>>>> spark.pyspark.virtualenv.type=native --conf
>>>>>>>> spark.pyspark.virtualenv.requirements=/home/hduser/dba/bin/python/requirements.txt
>>>>>>>> --conf
>>>>>>>> spark.pyspark.virtualenv.bin.path=/usr/src/Python-3.7.3/airflow_virtualenv
>>>>>>>> --conf
>>>>>>>> spark.pyspark.python=/usr/src/Python-3.7.3/airflow_virtualenv/bin/python3
>>>>>>>> xyz.py
>>>>>>>>
>>>>>>>>
>>>>>>>> works fine with the same spark version and $SPARK_HOME/jars
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 6 Apr 2021 at 13:20, Sean Owen <sro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> You may be compiling your app against 3.0.1 JARs but submitting to
>>>>>>>>> 3.1.1.
>>>>>>>>> You do not in general modify the Spark libs. You need to package
>>>>>>>>> libs like this with your app at the correct version.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 6, 2021 at 6:42 AM Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Gabor.
>>>>>>>>>>
>>>>>>>>>> All nodes are running Spark /spark-3.1.1-bin-hadoop3.2
>>>>>>>>>>
>>>>>>>>>> So $SPARK_HOME/jars contains all the required jars on all nodes
>>>>>>>>>> including the jar file commons-pool2-2.9.0.jar as well.
>>>>>>>>>>
>>>>>>>>>> They are installed identically on all nodes.
>>>>>>>>>>
>>>>>>>>>> I have looked at the Spark environment for classpath. Still I
>>>>>>>>>> don't see the reason why Spark 3.1.1 fails with
>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>>>>>>> but works ok with  spark-sql-kafka-0-10_2.12-3.1.0.jar
>>>>>>>>>>
>>>>>>>>>> Anyway I unzipped the tarball for Spark-3.1.1 and there is
>>>>>>>>>> no spark-sql-kafka-0-10_2.12-3.0.1.jar even
>>>>>>>>>>
>>>>>>>>>> I had to add spark-sql-kafka-0-10_2.12-3.0.1.jar to make it work.
>>>>>>>>>> Then I enquired the availability of new version from Maven that 
>>>>>>>>>> pointed to
>>>>>>>>>> *spark-sql-kafka-0-10_2.12-3.1.1.jar*
>>>>>>>>>>
>>>>>>>>>> So to confirm Spark out of the tarball does not have any
>>>>>>>>>>
>>>>>>>>>> ltr spark-sql-kafka-*
>>>>>>>>>> ls: cannot access spark-sql-kafka-*: No such file or directory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For SSS, I had to add these
>>>>>>>>>>
>>>>>>>>>> add commons-pool2-2.9.0.jar. The one shipped is
>>>>>>>>>>  commons-pool-1.5.4.jar!
>>>>>>>>>>
>>>>>>>>>> add kafka-clients-2.7.0.jar  Did not have any
>>>>>>>>>>
>>>>>>>>>> add  spark-sql-kafka-0-10_2.12-3.0.1.jar  Did not have any
>>>>>>>>>>
>>>>>>>>>> I gather from your second mail, there seems to be an issue with
>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.*1*.1.jar ?
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 6 Apr 2021 at 11:54, Gabor Somogyi <
>>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Since you've not shared too much details I presume you've
>>>>>>>>>>> updated the spark-sql-kafka jar only.
>>>>>>>>>>> KafkaTokenUtil is in the token provider jar.
>>>>>>>>>>>
>>>>>>>>>>> As a general note if I'm right, please update Spark as a whole
>>>>>>>>>>> on all nodes and not just jars independently.
>>>>>>>>>>>
>>>>>>>>>>> BR,
>>>>>>>>>>> G
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 6, 2021 at 10:21 AM Mich Talebzadeh <
>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Any chance of someone testing  the latest 
>>>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.1.1.jar
>>>>>>>>>>>> for Spark. It throws
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> java.lang.NoSuchMethodError:
>>>>>>>>>>>> org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> However, the previous version
>>>>>>>>>>>> spark-sql-kafka-0-10_2.12-3.0.1.jar works fine
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>>> other
>>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>>> content is
>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

Reply via email to