[ https://issues.apache.org/jira/browse/HUDI-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564649#comment-17564649 ]
Yuwei Xiao edited comment on HUDI-4318 at 7/10/22 7:37 AM: ----------------------------------------------------------- Failed to re-produce the exception. My test code: {code:java} val schema = StructType( Array( StructField("uuid", StringType), StructField("ts", LongType), StructField("partitionpath", StringType), StructField("array_field", DataTypes.createArrayType(StringType)) )) val data = Seq(Row("id1", 1L, "2020/01/01", List("a","b","c"))) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.write.format("org.apache.hudi") .options(getQuickstartWriteConfigs) .option(PRECOMBINE_FIELD.key, "ts") .option(RECORDKEY_FIELD.key, "uuid") .option(PARTITIONPATH_FIELD.key, "partitionpath") .option(INDEX_TYPE.key(), IndexType.BUCKET.name()) .option(BUCKET_INDEX_ENGINE_TYPE.key(), BucketIndexEngineType.SIMPLE.name()) .option(BUCKET_INDEX_NUM_BUCKETS.key(), "4") .option(TBL_NAME.key, tableName) .mode(Overwrite) .save(tablePath) {code} was (Author: JIRAUSER280718): Failed to re-produce the exception. My test code: {code:java} val schema = StructType( Array( StructField("uuid", StringType), StructField("ts", LongType), StructField("partitionpath", StringType), StructField("array_field", DataTypes.createArrayType(StringType)) )) val data = Seq(Row("id1", 1L, "2020/01/01", List("a","b","c"))) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.write.format("org.apache.hudi") .options(getQuickstartWriteConfigs) .option(PRECOMBINE_FIELD.key, "ts") .option(RECORDKEY_FIELD.key, "uuid") .option(PARTITIONPATH_FIELD.key, "partitionpath") .option(INDEX_TYPE.key(), IndexType.BUCKET.name()) .option(BUCKET_INDEX_ENGINE_TYPE.key(), BucketIndexEngineType.SIMPLE.name()) .option(BUCKET_INDEX_NUM_BUCKETS.key(), "4") .option(TBL_NAME.key, tableName) .mode(Overwrite) .save(tablePath) {code} > IndexOutOfBoundException when recordKey has List values for Bucket index table > ------------------------------------------------------------------------------ > > Key: HUDI-4318 > URL: https://issues.apache.org/jira/browse/HUDI-4318 > Project: Apache Hudi > Issue Type: Bug > Components: core > Affects Versions: 0.11.1 > Reporter: Harsha Teja Kanna > Assignee: Yuwei Xiao > Priority: Minor > > Currently, the Bucket index is supported only if the record key has columns > with simple values. > [https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java#L71] > Example record for which this breaks > column1:value1,column2:value2,column3:[value1,value2] -- This message was sent by Atlassian Jira (v8.20.10#820010)