It must be some misconfiguration in your environment. Do you perhaps
have a hardwired $SPARK_HOME env variable in your shell? An easy test
would be to place the spark-avro jar file you downloaded in the jars
directory of Spark and run spark-shell again without the packages
option. This will guarantee that the jar file is on the classpath of
Spark driver and executors..
On 3/10/22 1:24 PM, Yong Zhang wrote:
Hi,
I am puzzled with this issue of Spark 3.1 version to read avro file.
Everything is done on my local mac laptop so far, and I really don't
know where the issue comes from, and I googled a lot and cannot find
any clue.
I am always using Spark 2.4 version, as it is really mature. But for a
new project, I want to taste Spark 3.1, which needs to read AVRO file.
To my surprise, on my local, the Spark 3.1.3 throws error when trying
to read the avro files.
* I download the Spark 3.1.2 and 3.1.3 with Hadoop2 or 3 from
https://spark.apache.org/downloads.html
* Use JDK "1.8.0_321" on the Mac
* Untar the spark 3.1.x local
* And follow
https://spark.apache.org/docs/3.1.3/sql-data-sources-avro.html
Start the spark-shell in the following exactly command:
spark-3.1.3-bin-hadoop3.2/bin/spark-shell --packages
org.apache.spark:*spark-avro_2.12:3.1.3*
*
And I always get the following error when read the existing test AVRO
files:
scala> val pageview =
spark.read.format("avro").load("/Users/user/output/raw/")
/java.lang.NoClassDefFoundError:
org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2
/
/
/
I tried different version of Spark 3.x, from Spark 3.1.2 -> 3.1.3 ->
3.2.1, and I believe they are all under Scala 2.12, and I start the
spark-shell with "--packages
org.apache.spark:*spark-avro_2.12:x.x.x*", which x.x.x matches the
Spark version, but I got the above wired "NoClassDefFoundError" in
*all* cases.
/
/
Meantime, download Spark2.4.8 and start spark-shell with "--packages
org.apache.spark:*spark-avro_2.11:2.4.3*", I can read the *exactly
same ARVO file* without any issue.
I am thinking it must be done wrongly on my end, but:
* I downloaded several versions of Spark and untar them directly.
* I *DIDN'T *have any custom "spark-env.sh/spark-default.conf" file
to include any potential jar files to mess up things
* Straight creating a spark session under spark-shell with the
correct package and try to read avro files. Nothing more.
I have to doubt there are something wrong with the Spark 3.x avro
package releases, but I know that possiblity is very low, especailly
for multi different veresions. But the class
"org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2"
existed under "spark-sql_2.12-3.1.3.jar", as blow:
/jar tvf spark-sql_2.12-3.1.3.jar | grep FileDataSourceV2//
/
/15436 Sun Feb 06 22:54:00 EST 2022
org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.class/
So what could be wrong?
Thanks
Yong