Hi,

I am using trying to setup a PySpark Jupyter notebook on an AWS EMR cluster to 
read Hudi datasets. I am using the latest settings:


  *   Emr7.6, Hudi v0.15, Hadoop v3.4.x and Spark 3.5.x

However, I obtained an error shown below. I have a few questions:


  1.  I could not find where the API may have changed, but I am wondering if 
this is due to a version incomptability? I realize I have not linked any code, 
but I’m using some custom JAR files and setups.
  2.  Is there a matrix somewhere showing compatability of Hudi with different 
Hadoop versions?

```
25/01/14 15:38:08 WARN SparkSession: Cannot use 
org.apache.spark.sql.hudi.HoodieSparkSessionExtension to configure session 
extensions.
java.lang.NoClassDefFoundError: org/apache/spark/rdd/SecureRDD
```

Note that my setup does work with Emr7.0, Hudi v0.15, Hadoop v3.3.x and Spark 
3.5.x, but I am trying to understand the scope of this issue, and if the 
`SecureRDD` class was deprecated or removed. I could not find any information 
online, but I may have been looking in the wrong places.

Thanks!

Reply via email to