Hi, I am using trying to setup a PySpark Jupyter notebook on an AWS EMR cluster to read Hudi datasets. I am using the latest settings:
* Emr7.6, Hudi v0.15, Hadoop v3.4.x and Spark 3.5.x However, I obtained an error shown below. I have a few questions: 1. I could not find where the API may have changed, but I am wondering if this is due to a version incomptability? I realize I have not linked any code, but I’m using some custom JAR files and setups. 2. Is there a matrix somewhere showing compatability of Hudi with different Hadoop versions? ``` 25/01/14 15:38:08 WARN SparkSession: Cannot use org.apache.spark.sql.hudi.HoodieSparkSessionExtension to configure session extensions. java.lang.NoClassDefFoundError: org/apache/spark/rdd/SecureRDD ``` Note that my setup does work with Emr7.0, Hudi v0.15, Hadoop v3.3.x and Spark 3.5.x, but I am trying to understand the scope of this issue, and if the `SecureRDD` class was deprecated or removed. I could not find any information online, but I may have been looking in the wrong places. Thanks!
