Dear community, Nice to share Hudi community bi-weekly updates for 2021-11-21 ~ 2021-12-05 with updates on features, bug fixes and tests.
======================================= Features [Spark SQL] extract HoodieCatalogTable to coordinate spark catalog table and hoodie table [1] [Deltastreamer] Add Debezium Source for deltastreamer [2] [Core] Add Amazon CloudWatch metrics reporter [3] [Core] Support hilbert curve for hudi [4] [Flink Integration] Support flink catalog to help user use flink table conveniently [5] [Core] Introduce a pulsar implementation of hoodie write commit [6] [Core] Support HiveSchemaProvider [7] [1] https://issues.apache.org/jira/browse/HUDI-2759 [2] https://issues.apache.org/jira/browse/HUDI-1290 [3] https://issues.apache.org/jira/browse/HUDI-2801 [4] https://issues.apache.org/jira/browse/HUDI-2102 [5] https://issues.apache.org/jira/browse/HUDI-2877 [6] https://issues.apache.org/jira/browse/HUDI-2937 [7] https://issues.apache.org/jira/browse/HUDI-2418 ======================================= Bugs [Flink] Set up keygen class explicit for write config for flink table upgrade [1] [Core] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job [2] [Core] Converting commit timestamp format to millisecs [3] [Core] Expand File-Group candidates list for appending for MOR tables [4] [Flink] Use earliest instant by default for async compaction and clustering jobs [5] [Core] Rollback unfinished replace commit to allow updates [6] [Core] Assume path exists and defer fs.exists() in AbstractTableFileSystemView [7] [Core] Optimize statistics collection related codes and add some docs for z-order add fix some bugs [8] [Core] Using HBase shaded jars in Hudi presto bundle [9] [Core] Add clustering and compaction in Kafka Connect Sink [10] [Core] Add hive sync support to kafka connect [11] [Core] Securing usages of SimpleDateFormat to be thread-safe [12] [Core] Fix 2to3 upgrade when set `hoodie.table.keygenerator.class` [13] [Spark Integration] refresh table after drop partition [14] [Flink Integration] Flink metadata table supports virtual keys [15] [Core] Fix kafka offset handling in Kafka Connect protocol [16] [Core] Hudi KVComparator for all HFile writer usages [17] [Core] Fixing issues w/ Z-order Layout Optimization [18] [Core] Cluster update strategy should not be fenced by write config [19] [Deltastreamer] Fixing deltastreamer checkpoint fetch/copy over [20] [Core] Add JMX deps in hudi utilities and kafka connect bundles [21] [CLI] Fixing archived Timeline crashing if timeline contains REPLACE_COMMIT [22] [Core] Configure metadata payload consistency check [23] [Core] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader [24] [Deltastreamer] fixing mysql debezium source [25] [Core] Remove rdd.isEmpty() validation to prevent CreateHandle being called twice [26] [Core] Guarding table service commits within a single lock to commit to both data table and metadata table [27] [deltastreamer] Fixing handling of cluster update reject exception in deltastreamer [28] [Core] Fixing lazy rollback for MOR with list based strategy [29] [Deltastreamer] Fixed DeltaStreaemer to properly respect configuration passed t/h properties file [30] [Core] Removing direct fs call in HoodieLogFileReader [31] [Core] Table metadata returns empty for non-exist partition [32] [CLI] Fixing Clustering CLI - schedule and run command fixes to avoid NumberFormatException [33] [Core] Addressing issues w/ Z-order Layout Optimization [34] [Core] Re-use same rollback instant time for failed rollbacks [35] [Core] Enabling timeline-server-based marker as default [36] [CLI] Metadata CLI - files/partition file listing fix and new validate option [37] [Core] Metadata table creation and avoid bootstrapping race for write client & add locking for upgrade [38] [Spark Integration] Add support ignoring case in update sql operation [39] [Core] Fix write configs for Java engine in Kafka Connect Sink [40] [Core] Fixing loading of props from default dir [41] [Core] Compact the file group with larger log files to reduce write amplification [42] [Core] Fix metadata table archival overstepping between regular writers and table services [43] [Flink Integration] Fix remote timeline server config for flink [44] [Core] Refresh the fs view on successful checkpoints for write profile [45] [Core] Fixing populate meta fields with Hfile writers and Disabling virtual keys by default for metadata table [46] [Core] Removing default value for PARTITIONPATH_FIELD_NAME resulting in incorrect `KeyGenerator` configuration [47] [Core] Metadata table - avoiding key lookup failures on base files over S3 [48] [Core] Kafka Connect: Fix failed writes and avoid table service concurrent operations [49] [Core] Fixing metadata table reader when metadata compaction is inflight [50] [Deltastreamer] Remove special casing of clustering in deltastreamer checkpoint retrival [51] [1] https://issues.apache.org/jira/browse/HUDI-2702 [2] https://issues.apache.org/jira/browse/HUDI-2533 [3] https://issues.apache.org/jira/browse/HUDI-2559 [4] https://issues.apache.org/jira/browse/HUDI-2550 [5] https://issues.apache.org/jira/browse/HUDI-2737 [6] https://issues.apache.org/jira/browse/HUDI-1937 [7] https://issues.apache.org/jira/browse/HUDI-2743 [8] https://issues.apache.org/jira/browse/HUDI-2778 [9] https://issues.apache.org/jira/browse/HUDI-2409 [10] https://issues.apache.org/jira/browse/HUDI-2332 [11] https://issues.apache.org/jira/browse/HUDI-2325 [12] https://issues.apache.org/jira/browse/HUDI-2831 [13] https://issues.apache.org/jira/browse/HUDI-2818 [14] https://issues.apache.org/jira/browse/HUDI-2838 [15] https://issues.apache.org/jira/browse/HUDI-2847 [16] https://issues.apache.org/jira/browse/HUDI-2671 [17] https://issues.apache.org/jira/browse/HUDI-2443 [18] https://issues.apache.org/jira/browse/HUDI-2778 [19] https://issues.apache.org/jira/browse/HUDI-2766 [20] https://issues.apache.org/jira/browse/HUDI-2793 [21] https://issues.apache.org/jira/browse/HUDI-2853 [22] https://issues.apache.org/jira/browse/HUDI-2844 [23] https://issues.apache.org/jira/browse/HUDI-2792 [24] https://issues.apache.org/jira/browse/HUDI-2480 [25] https://issues.apache.org/jira/browse/HUDI-1290 [26] https://issues.apache.org/jira/browse/HUDI-2800 [27] https://issues.apache.org/jira/browse/HUDI-2794 [28] https://issues.apache.org/jira/browse/HUDI-2858 [29] https://issues.apache.org/jira/browse/HUDI-2841 [30] https://issues.apache.org/jira/browse/HUDI-2840 [31] https://issues.apache.org/jira/browse/HUDI-2005 [32] https://issues.apache.org/jira/browse/HUDI-2852 [33] https://issues.apache.org/jira/browse/HUDI-2850 [34] https://issues.apache.org/jira/browse/HUDI-2814 [35] https://issues.apache.org/jira/browse/HUDI-2861 [36] https://issues.apache.org/jira/browse/HUDI-2767 [37] https://issues.apache.org/jira/browse/HUDI-2845 [38] https://issues.apache.org/jira/browse/HUDI-2475 [39] https://issues.apache.org/jira/browse/HUDI-2642 [40] https://issues.apache.org/jira/browse/HUDI-2891 [41] https://issues.apache.org/jira/browse/HUDI-2880 [42] https://issues.apache.org/jira/browse/HUDI-2881 [43] https://issues.apache.org/jira/browse/HUDI-2904 [44] https://issues.apache.org/jira/browse/HUDI-2914 [45] https://issues.apache.org/jira/browse/HUDI-2924 [46] https://issues.apache.org/jira/browse/HUDI-2902 [47] https://issues.apache.org/jira/browse/HUDI-2911 [48] https://issues.apache.org/jira/browse/HUDI-2894 [49] https://issues.apache.org/jira/browse/HUDI-2890 [50] https://issues.apache.org/jira/browse/HUDI-2923 [51] https://issues.apache.org/jira/browse/HUDI-2935 ====================================== Tests [Tests] Add more Spark CI build tasks [1] [Tests] Fix skipped HoodieSparkSqlWriterSuite [2] [1] https://issues.apache.org/jira/browse/HUDI-1870 [2] https://issues.apache.org/jira/browse/HUDI-2868 Best, Leesf