I am trying to deploy a standalone cluster but running into ClassNotFound
errors.

I have tried a whole myriad of different approaches varying from packaging
all dependencies into a single JAR and using the --packages and
--driver-class-path options.

I’ve got a master node started, a slave node running on the same system,
and am using spark submit to get the streaming job kicked off.

Here is the error I’m getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError:
org/apache/kafka/common/serialization/ByteArrayDeserializer
    at 
org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
    at 
org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
    at 
org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
    at 
org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
    at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
    at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
    at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
    at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
    at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
    at com.Customer.start(Customer.scala:47)
    at com.Main$.main(Main.scala:23)
    at com.Main.main(Main.scala)
    ... 6 more
Caused by: java.lang.ClassNotFoundException:
org.apache.kafka.common.serialization.ByteArrayDeserializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 18 more

Here is the spark submit command I’m using:

./spark-submit \
    --master spark://<domain>:<port> \
    --files jaas.conf \
    --deploy-mode cluster \
    --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
    --conf 
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf"
\
    --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
    --driver-class-path
~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
    --class <class_main> \
    --verbose \
    my_jar.jar

I’ve tried all sorts of combinations of including different packages and
driver-class-path jar files. As far as I can find, the serializer should be
in the kafka-clients jar file, which I’ve tried including to no success.

Pom Dependencies are as follows:

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.8-dmr</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
    </dependencies>

If I remove --deploy-mode and run it as client … it works just fine.

Thanks Everyone -

Geoff V.
​

Reply via email to