nsivabalan commented on code in PR #9106: URL: https://github.com/apache/hudi/pull/9106#discussion_r1252401000
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ########## @@ -116,11 +134,10 @@ public static HoodieWriteConfig createMetadataWriteConfig( // Below config is only used if isLogCompactionEnabled is set. .withLogCompactionBlocksThreshold(writeConfig.getMetadataLogCompactBlocksThreshold()) .build()) - .withParallelism(parallelism, parallelism) - .withDeleteParallelism(parallelism) - .withRollbackParallelism(parallelism) - .withFinalizeWriteParallelism(parallelism) - .withAllowMultiWriteOnSameInstant(true) + .withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(maxHFileSizeBytes) + .logFileMaxSize(maxLogFileSizeBytes).logFileDataBlockMaxSize(maxLogBlockSizeBytes).build()) + .withRollbackParallelism(defaultParallelism) + .withFinalizeWriteParallelism(defaultParallelism) Review Comment: did you remove .withAllowMultiWriteOnSameInstant(true) intentionally ? ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ########## @@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig( .withCleanConfig(HoodieCleanConfig.newBuilder() .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN) .withAutoClean(false) - .withCleanerParallelism(parallelism) - .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS) + .withCleanerParallelism(defaultParallelism) + .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS) + .retainFileVersions(2) Review Comment: I understand it could be a larger change, but file versions makes sense in general. If uber has been running w/ file versions for 6+ months, we should do a round of testing on our end, and can possibly proceed. but incremental cleaning may not kick in. so, for large MDTs, wondering will there be any latency hit ########## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ########## @@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String> partitionList) { long beginTs = System.currentTimeMillis(); // Not loaded yet try { - LOG.info("Building file system view for partitions " + partitionSet); + if (partitionSet.size() < 100) { + LOG.info("Building file system view for partitions: " + partitionSet); Review Comment: yes, may be we should reconsider the freq of logging here. for eg, log every every 100 partitions or something. not sure we will gain much by logging this for every partition. ########## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ########## @@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() { } public Map<String, String> stats() { - return metrics.map(m -> m.getStats(true, metadataMetaClient, this)).orElse(new HashMap<>()); + Set<String> allMetadataPartitionPaths = Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet()); + return metrics.map(m -> m.getStats(true, metadataMetaClient, this, allMetadataPartitionPaths)).orElse(new HashMap<>()); Review Comment: HoodieMetadataMetrics.getStats(boolean detailed, HoodieTableMetaClient metaClient, HoodieTableMetadata metadata) reloads the timeline. can we move the reload to outside of the caller so that we don't reload for every MDT partition stats ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ########## @@ -176,7 +176,7 @@ private void initMetadataReader() { } try { - this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath()); + this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath(), true); Review Comment: rational is that, metadata writer itself is short lived just for committing one instant and so we should be good to enable re-use here? do we even expect to see any improvement here, since this is meant just for one write to MDT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org