Dear community, Nice to share Hudi community bi-weekly updates for 2022-01-02 ~ 2022-01-16 with updates on features, bug fixes.
======================================= Features [Flink] hudi-flink support timestamp-micros [1] [Spark] New clustering regex match config to choose partitions when building clustering plan [2] [Flink] Make metadata commit synchronous for flink batch [3] [Trino] Add Trino setup in Docker Demo [4] [1] https://issues.apache.org/jira/browse/HUDI-3184 [2] https://issues.apache.org/jira/browse/HUDI-3045 [3] https://issues.apache.org/jira/browse/HUDI-3233 [4] https://issues.apache.org/jira/browse/HUDI-2785 ======================================= Bugs [Core] Fixing Clustering w/ sort columns with null values fails [1] [Core] Fix bulk_insert failure on Spark 3.2.0 [2] [Core] Handle duplicate instants when fetching pending clustering plans [3] [Core] Metadata merged log record reader - avoiding NullPointerException when records by keys [4] [Core] Add endpoint_url to dynamodb lock provider [5] [Core] Closing LogRecordScanner in compactor [6] [Core] Sync empty table to hive metastore [7] [Core] Do not preserve filename when preserveCommitMetadata enabled [8] [Core] Fixing null schema with empty commit in incremental relation [9] [Core] Adding support to preserve commit metadata for compaction [10] [Core] Enabling savepoint and restore for MOR table [11] [Core] Add default HUDI_DIR in setupKafka.sh [12] [Core] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter [13] [Core] Add config for hive conditional sync [14] [Deltastreamer] Fixing checkpoint fetch in detlastreamer [15] [Core] HoodieConfig#getBoolean should return false when default not set [16] [Spark] Fix merge/insert/show partitions error on Spark3.2 [17] [Spark] Spark metastore schema evolution broken [18] [Core] Handle logical type in TimestampBasedKeyGenerator [19] [Presto Integration] Shade htrace and parquet-avro in presto bundle [20] [Core] Fixing metadata table compaction so as to not include uncommitted data [21] [Core] Kafka-connect support of hadoop config environments and properties [22] [Spark] spark-sql write timestamp directly [23] [Core] Remove aws jars from hudi bundles [24] [Core] making some fixes to S3 incremental source [25] [Core] Fix KafkaConnect cannot sync to Hive Problem [26] [Core] InProcessLockPovider as default when any async servcies enabled with no lock provider override [27] [Core] Allow empty commits in Kafka Connect Sink for Hudi [28] [Core] Include files from completed commits while bootstrapping metadata table [29] [Core] Create pushgateway client based on port [30] [Core] Addressing performance traps in Bulk Insert/Layout Optimization [31] [Core] Unify Hive's InputFormat implementations to avoid duplication [32] [Core] Corrected the check for incremental sql [33] [Core] Refactor hudi existing modules to make more code reuse in V2 Implementation [34] [Spark SQL] Improve Spark SQL create table from existing hudi table [35] [1] https://issues.apache.org/jira/browse/HUDI-2558 [2] https://issues.apache.org/jira/browse/HUDI-3140 [3] https://issues.apache.org/jira/browse/HUDI-2774 [4] https://issues.apache.org/jira/browse/HUDI-3141 [5] https://issues.apache.org/jira/browse/HUDI-3147 [6] https://issues.apache.org/jira/browse/HUDI-2966 [7] https://issues.apache.org/jira/browse/HUDI-3171 [8] https://issues.apache.org/jira/browse/HUDI-3170 [9] https://issues.apache.org/jira/browse/HUDI-3168 [10] https://issues.apache.org/jira/browse/HUDI-44 [11] https://issues.apache.org/jira/browse/HUDI-52 [12] https://issues.apache.org/jira/browse/HUDI-3118 [13] https://issues.apache.org/jira/browse/HUDI-3183 [14] https://issues.apache.org/jira/browse/HUDI-3100 [15] https://issues.apache.org/jira/browse/HUDI-2947 [16] https://issues.apache.org/jira/browse/HUDI-3185 [17] https://issues.apache.org/jira/browse/HUDI-3136 [18] https://issues.apache.org/jira/browse/HUDI-3192 [19] https://issues.apache.org/jira/browse/HUDI-2909 [20] https://issues.apache.org/jira/browse/HUDI-3139 [21] https://issues.apache.org/jira/browse/HUDI-3178 [22] https://issues.apache.org/jira/browse/HUDI-3104 [23] https://issues.apache.org/jira/browse/HUDI-3125 [24] https://issues.apache.org/jira/browse/HUDI-3157 [25] https://issues.apache.org/jira/browse/HUDI-3009 [26] https://issues.apache.org/jira/browse/HUDI-3112 [27] https://issues.apache.org/jira/browse/HUDI-3030 [28] https://issues.apache.org/jira/browse/HUDI-2735 [29] https://issues.apache.org/jira/browse/HUDI-3180 [30] https://issues.apache.org/jira/browse/HUDI-3148 [31] https://issues.apache.org/jira/browse/HUDI-2950 [32] https://issues.apache.org/jira/browse/HUDI-3094 [33] https://issues.apache.org/jira/browse/HUDI-485 [34] https://issues.apache.org/jira/browse/HUDI-3172 [35] https://issues.apache.org/jira/browse/HUDI-3198 =================================== Tests [Tests] Fix broken UT test for TestHiveSyncTool.testDropPartitions [1] [Tests] Enabling InProcessLockProvider for all multi-writer tests instead of FileSystemBasedLockProviderTestClass [2] [1] https://issues.apache.org/jira/browse/HUDI-3138 [2] https://issues.apache.org/jira/browse/HUDI-3165 Best, Leesf