Hi All, I am trying too build Kafka-0-10-sql module under external folder in apache spark source code. Once i generate jar file using, build/mvn package -DskipTests -pl external/kafka-0-10-sql i get jar file created under external/kafka-0-10-sql/target.
And try to run spark-shell with jars created in target folder as below, bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar i get below error based on the command, Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://10.1.10.241:4040 Spark context available as 'sc' (master = local[*], app id = local-1498676043936). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) Type in expressions to have them evaluated. Type :help for more information. scala> val lines = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "test").load() java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:378) at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala) at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:325) at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60) at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:192) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87) at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30) at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150) ... 48 elided Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 57 more ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ i have tried building the jar with dependencies, but still face the same error. But when i try to do --package with spark-shell using bin/spark-shell --package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works fine. The reason, i am trying to build something from source code, is because i want to try pushing dataframe data into kafka topic, based on the url https://github.com/apache/spark/commit/b0a5cd89097c563e9949d8cfcf84d18b03b8d24c, which doesn't work with version 2.1.0. Any help would be highly appreciated. Regards, Satyajit.