stym06 opened a new issue #3217:
URL: https://github.com/apache/hudi/issues/3217


   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? 
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I'm running HoodieDeltaStreamer using the jar under the packaging folder by 
creating a Docker image and running on K8s through spark-on-k8s-operator.
   
    I'm getting this error:
   ```
   Exception in thread "main" java.io.IOException: Could not load key generator 
class org.apache.hudi.hive.NonpartitionedKeyGenerator
        at 
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:95)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:211)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:571)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:138)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:102)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:480)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to load class
        at 
org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:57)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:88)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:99)
        at 
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:93)
        ... 17 more
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hudi.hive.NonpartitionedKeyGenerator
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:53)
        ... 20 more
   ```
   
   Dockerfile
   ```
   FROM gcr.io/spark-operator/spark:v2.4.4
   USER ${spark_uid}
   ADD 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar
 $SPARK_HOME/jars
   ADD 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
 $SPARK_HOME/jars
   RUN mkdir -p /opt/spark/hudi/
   RUN mkdir -p /opt/spark/hudi/output
   COPY 
hudi-client/hudi-client-common/target/hudi-client-common-0.9.0-SNAPSHOT.jar 
/opt/spark/hudi/
   COPY 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.9.0-SNAPSHOT.jar
 /opt/spark/hudi/
   ```
   
   K8s Yaml file:
   ```
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: hudi-0.9
     namespace: data-platform
   spec:
     type: Java
     mode: cluster
     image: "<personal_repo>:6003/hudi:0.9.14-dev"
     imagePullPolicy: IfNotPresent
     mainClass: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
     mainApplicationFile: 
local:///opt/spark/hudi/hudi-utilities-bundle_2.11-0.9.0-SNAPSHOT.jar
     deps:
       jars:
         - local:///opt/spark/hudi//hudi-client-common-0.9.0-SNAPSHOT.jar
       packages:
         - org.apache.spark:spark-avro_2.11:2.4.4
     #pass agruments in list
     arguments:
       - "--table-type"
       - "COPY_ON_WRITE"
       - "--props"
       - "/opt/spark/hudi/config/source.properties"
       - "--schemaprovider-class"
       - "org.apache.hudi.utilities.schema.SchemaRegistryProvider"
       - "--source-class"
       - "org.apache.hudi.utilities.sources.AvroKafkaSource"
       - "--target-base-path"
       - 
"s3a://dp-ingestion-stage/hudi/stage/kubernetes-mysql.inventory.peopledata"
       - "--target-table"
       - "kubernetes-mysql.inventory.peopledata"
       - "--op"
       - "UPSERT"
       - "--source-ordering-field"
       - "__ts_ms"
       - "--enable-sync"
     sparkVersion: "2.4.4"
     restartPolicy:
       type: Never
     driver:
       env:
         - name: HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
           valueFrom:
             secretKeyRef:
               name: aws-dp-ingestion-stage-sc
               key: AWS_ACCESS_KEY_ID
         - name: HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
           valueFrom:
             secretKeyRef:
               name: aws-dp-ingestion-stage-sc
               key: AWS_SECRET_ACCESS_KEY
         - name: HOODIE_ENV_fs_DOT_s3a_DOT_impl
           value: org.apache.hadoop.fs.s3a.S3AFileSystem
       cores: 1
       coreLimit: "1200m"
       memory: "1g"
       labels:
         version: 2.4.4
       serviceAccount: dpv3-spark-sa
       configMaps:
         - name: hudi-configmap
           path: /opt/spark/hudi/config
     executor:
       env:
         - name: HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
           valueFrom:
             secretKeyRef:
               name: aws-dp-ingestion-stage-sc
               key: AWS_ACCESS_KEY_ID
         - name: HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
           valueFrom:
             secretKeyRef:
               name: aws-dp-ingestion-stage-sc
               key: AWS_SECRET_ACCESS_KEY
       cores: 1
       instances: 1
       memory: "1g"
       labels:
         version: 2.4.4
       configMaps:
         - name: hudi-configmap
           path: /opt/spark/hudi/config
   ```
   
   Hoodie Conf:
   ```
   apiVersion: v1
   kind: ConfigMap
   metadata:
     name: hudi-configmap
     namespace: data-platform
   data:
     source.properties: |
       #base properties
       hoodie.upsert.shuffle.parallelism=2
       hoodie.insert.shuffle.parallelism=2
       hoodie.delete.shuffle.parallelism=2
       hoodie.bulkinsert.shuffle.parallelism=2
       hoodie.embed.timeline.server=true
       hoodie.filesystem.view.type=EMBEDDED_KV_STORE
       hoodie.compact.inline=false
   
       #datasource properties
       
hoodie.deltastreamer.schemaprovider.registry.url=http://dpv3-cp-sr-cp-schema-registry:8081/subjects/kubernetes-mysql.inventory.peopledata-value/versions/latest
       hoodie.datasource.write.recordkey.field=id
       hoodie.datasource.write.partitionpath.field=
       
hoodie.deltastreamer.source.kafka.topic=kubernetes-mysql.inventory.peopledata
       
hoodie.datasource.write.keygenerator.class=org.apache.hudi.hive.NonpartitionedKeyGenerator
       
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator
       #hive sync properties
       hoodie.datasource.hive_sync.enable=true
       hoodie.datasource.hive_sync.database=inventory
       hoodie.datasource.hive_sync.table=peopledata
       hoodie.datasource.hive_sync.username=root
       hoodie.datasource.hive_sync.password=pass
       hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver2:10000
   
       #kafka props
       bootstrap.servers=kfk1:9094,kfk2:9094,kfk2:9094
       auto.offset.reset=earliest
       schema.registry.url=http://dpv3-cp-sr-cp-schema-registry:8081
   
   
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create Docker image
   2. Create ConfigMap
   3. Create SparkApplication
   4.
   
   **Expected behavior**
   
   Records from Kafka should be upserted into S3 and Hive Sync should have 
completed
   
   **Environment Description**
   
   * Hudi version : 0.9-SNAPSHOT
   
   * Spark version : 2.4.4
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.2.0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to