stym06 opened a new issue #3217: URL: https://github.com/apache/hudi/issues/3217
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I'm running HoodieDeltaStreamer using the jar under the packaging folder by creating a Docker image and running on K8s through spark-on-k8s-operator. I'm getting this error: ``` Exception in thread "main" java.io.IOException: Could not load key generator class org.apache.hudi.hive.NonpartitionedKeyGenerator at org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:95) at org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:211) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:571) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:138) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:102) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:480) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieException: Unable to load class at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:57) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:88) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:99) at org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:93) ... 17 more Caused by: java.lang.ClassNotFoundException: org.apache.hudi.hive.NonpartitionedKeyGenerator at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:53) ... 20 more ``` Dockerfile ``` FROM gcr.io/spark-operator/spark:v2.4.4 USER ${spark_uid} ADD https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar $SPARK_HOME/jars ADD https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar $SPARK_HOME/jars RUN mkdir -p /opt/spark/hudi/ RUN mkdir -p /opt/spark/hudi/output COPY hudi-client/hudi-client-common/target/hudi-client-common-0.9.0-SNAPSHOT.jar /opt/spark/hudi/ COPY packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.9.0-SNAPSHOT.jar /opt/spark/hudi/ ``` K8s Yaml file: ``` apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: hudi-0.9 namespace: data-platform spec: type: Java mode: cluster image: "<personal_repo>:6003/hudi:0.9.14-dev" imagePullPolicy: IfNotPresent mainClass: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer mainApplicationFile: local:///opt/spark/hudi/hudi-utilities-bundle_2.11-0.9.0-SNAPSHOT.jar deps: jars: - local:///opt/spark/hudi//hudi-client-common-0.9.0-SNAPSHOT.jar packages: - org.apache.spark:spark-avro_2.11:2.4.4 #pass agruments in list arguments: - "--table-type" - "COPY_ON_WRITE" - "--props" - "/opt/spark/hudi/config/source.properties" - "--schemaprovider-class" - "org.apache.hudi.utilities.schema.SchemaRegistryProvider" - "--source-class" - "org.apache.hudi.utilities.sources.AvroKafkaSource" - "--target-base-path" - "s3a://dp-ingestion-stage/hudi/stage/kubernetes-mysql.inventory.peopledata" - "--target-table" - "kubernetes-mysql.inventory.peopledata" - "--op" - "UPSERT" - "--source-ordering-field" - "__ts_ms" - "--enable-sync" sparkVersion: "2.4.4" restartPolicy: type: Never driver: env: - name: HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key valueFrom: secretKeyRef: name: aws-dp-ingestion-stage-sc key: AWS_ACCESS_KEY_ID - name: HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key valueFrom: secretKeyRef: name: aws-dp-ingestion-stage-sc key: AWS_SECRET_ACCESS_KEY - name: HOODIE_ENV_fs_DOT_s3a_DOT_impl value: org.apache.hadoop.fs.s3a.S3AFileSystem cores: 1 coreLimit: "1200m" memory: "1g" labels: version: 2.4.4 serviceAccount: dpv3-spark-sa configMaps: - name: hudi-configmap path: /opt/spark/hudi/config executor: env: - name: HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key valueFrom: secretKeyRef: name: aws-dp-ingestion-stage-sc key: AWS_ACCESS_KEY_ID - name: HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key valueFrom: secretKeyRef: name: aws-dp-ingestion-stage-sc key: AWS_SECRET_ACCESS_KEY cores: 1 instances: 1 memory: "1g" labels: version: 2.4.4 configMaps: - name: hudi-configmap path: /opt/spark/hudi/config ``` Hoodie Conf: ``` apiVersion: v1 kind: ConfigMap metadata: name: hudi-configmap namespace: data-platform data: source.properties: | #base properties hoodie.upsert.shuffle.parallelism=2 hoodie.insert.shuffle.parallelism=2 hoodie.delete.shuffle.parallelism=2 hoodie.bulkinsert.shuffle.parallelism=2 hoodie.embed.timeline.server=true hoodie.filesystem.view.type=EMBEDDED_KV_STORE hoodie.compact.inline=false #datasource properties hoodie.deltastreamer.schemaprovider.registry.url=http://dpv3-cp-sr-cp-schema-registry:8081/subjects/kubernetes-mysql.inventory.peopledata-value/versions/latest hoodie.datasource.write.recordkey.field=id hoodie.datasource.write.partitionpath.field= hoodie.deltastreamer.source.kafka.topic=kubernetes-mysql.inventory.peopledata hoodie.datasource.write.keygenerator.class=org.apache.hudi.hive.NonpartitionedKeyGenerator hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator #hive sync properties hoodie.datasource.hive_sync.enable=true hoodie.datasource.hive_sync.database=inventory hoodie.datasource.hive_sync.table=peopledata hoodie.datasource.hive_sync.username=root hoodie.datasource.hive_sync.password=pass hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver2:10000 #kafka props bootstrap.servers=kfk1:9094,kfk2:9094,kfk2:9094 auto.offset.reset=earliest schema.registry.url=http://dpv3-cp-sr-cp-schema-registry:8081 ``` **To Reproduce** Steps to reproduce the behavior: 1. Create Docker image 2. Create ConfigMap 3. Create SparkApplication 4. **Expected behavior** Records from Kafka should be upserted into S3 and Hive Sync should have completed **Environment Description** * Hudi version : 0.9-SNAPSHOT * Spark version : 2.4.4 * Hive version : 3.1.2 * Hadoop version : 3.2.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org