Jian Feng created HUDI-4210: ------------------------------- Summary: Create custom hbase index to solve data skew issue on hbase regions Key: HUDI-4210 URL: https://issues.apache.org/jira/browse/HUDI-4210 Project: Apache Hudi Issue Type: Improvement Components: index Reporter: Jian Feng Assignee: Jian Feng
In our production environment, since many table's id is auto-increment, if using Hbase index, will cause a data skew issue in HBase regions. it is better to find a way to add random prefixes and also keep ordering in hudi itself. we may have a small modification to the HBase index. add the prefix on the aspect of query and update HBase. In this way, the pk in HBase will be different from the one in hudi but such logic will be transparent to business logic. I have adopted this method in prod environment. Using withIndexClass config in IndexConfig could specify the custom index Another work, driven by uber engineers [https://github.com/apache/hudi/pull/3508] could technically solve the issue by directly reading HFiles, but still in progress, this approach should resolve this issue immediately -- This message was sent by Atlassian Jira (v8.20.7#820007)