Please try these two corrections:
1. The --packages isn't the right command line argument for
spark-submit. Please use --conf spark.jars.packages=your-package to
specify Maven packages or define your configuration parameters in
the spark-defaults.conf file
2. Please check the version number of your spark-avro jar file in
MavenCentral and see if that version is indeed available and
compatible with Spark 3.2. The version we are currently using for
Spark 3.2 is spark-avro_2.12-3.1.1.jar, not 3.2.0.
BTW, you do have to include the spark-avro lib as a customer jar file.
The Spark 3.2 distribution includes only the avro libs, not the
spark-avro lib. Hope this helps...
-- ND
On 2/9/22 10:25 PM, Karanika, Anna wrote:
Hello,
I have been trying to use spark SQL’s operations that are related to
the Avro file format,
e.g., stored as, save, load, in a Java class but they keep failing
with the following stack trace:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Failed to find data source: avro. Avro is built-in but external data
source module since Spark 2.4. Please deploy the application as per
the deployment section of "Apache Avro Data Source Guide".
at
org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
at
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
For context, I am invoking spark-submit and adding arguments
--packages org.apache.spark:spark-avro_2.12:3.2.0.
Yet, Spark responds as if the dependency was not added.
I am running spark-v3.2.0 (Scala 2.12).
On the other hand, everything works great with spark-shell or spark-sql.
I would appreciate any advice or feedback to get this running.
Thank you,
Anna