I wrote a toy spark job and ran it within my IDE, same error if I don’t add 
spark-avro to my pom.xml. After putting spark-avro dependency to my pom.xml, 
everything works fine.

Another thing is, if my memory serves me right, the spark-submit options for 
extra jars is ‘--jars’ , not ‘--packages’. 

Regards, 

Morven Huang


On 2022/02/10 03:25:28 "Karanika, Anna" wrote:
> Hello,
> 
> I have been trying to use spark SQL’s operations that are related to the Avro 
> file format,
> e.g., stored as, save, load, in a Java class but they keep failing with the 
> following stack trace:
> 
> Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed to 
> find data source: avro. Avro is built-in but external data source module 
> since Spark 2.4. Please deploy the application as per the deployment section 
> of "Apache Avro Data Source Guide".
>         at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
>         at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
>         at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
>         at 
> org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
>         at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
>         at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
>         at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 
> For context, I am invoking spark-submit and adding arguments --packages 
> org.apache.spark:spark-avro_2.12:3.2.0.
> Yet, Spark responds as if the dependency was not added.
> I am running spark-v3.2.0 (Scala 2.12).
> 
> On the other hand, everything works great with spark-shell or spark-sql.
> 
> I would appreciate any advice or feedback to get this running.
> 
> Thank you,
> Anna
> 
> 
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to