I can help you out. :) Created https://issues.apache.org/jira/browse/HUDI-164 to track this. Please share your jira ID and I will assign it to you. :)
We can write a simple loop that looks for the first non-zero size commit or fallback to default configs. On Wed, Jul 3, 2019 at 8:35 AM Kabeer Ahmed <kab...@linuxmail.org> wrote: > Hi Balaji, > > My confidence isnt great when it comes to edit the code to find the newest > non zero instant. So I would earnestly request someone who has worked on > this before to grab a look. It might be efficient for someone knowledgeable > around this code to add a fix rather than someone like me. (I would > honestly like to work on the fix if someone is willing to do hand holding > :) ). > Thanks, > On Jul 3 2019, at 4:32 pm, Balaji Varadarajan <v.bal...@ymail.com.INVALID> > wrote: > > Thanks for finding the old issue. Looks like we replied around the same > time :) Yeah, makes sense. The fix would probably be finding the newest > instant with non-zero records written and then using it for average record > calculation. Let us know if you are interested in working on the fix. > > Balaji.V > > On Wednesday, July 3, 2019, 8:25:33 AM PDT, Kabeer Ahmed < > kab...@linuxmail.org> wrote: > > > > Hi Nishith and All, > > I think I figured out what was blocking the processing of files for me. > The code snippet that I had sent before is indeed the issue. I have raised > a new issue and added a few details at: > https://github.com/apache/incubator-hudi/issues/776 ( > https://link.getmailspring.com/link/79fa058a-3683-40e0-be09-0a43ac4cb...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F776&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > Can someone please have a look and advise what is going wrong? > > > > Thank you, > > Kabeer. > > > > On Jul 3 2019, at 11:38 am, Kabeer Ahmed <kab...@linuxmail.org> wrote: > > > Hi Nishith, > > > > > > Please find the latest commit data in the gist at: > https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d ( > https://link.getmailspring.com/link/79fa058a-3683-40e0-be09-0a43ac4cb...@getmailspring.com/1?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > > For your convenience, I have also copy pasted it below. Any help is > highly appreciated. > > > > > > Thanks > > > Kabeer. > > > > > > 20190702161629.commit: > > > { > > > "partitionToWriteStats" : { > > > "2018/05/30" : [ { > > > "fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a", > > > "path" : > "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet", > > > "prevCommit" : "20190702161417", > > > "numWrites" : 11614, > > > "numDeletes" : 0, > > > "numUpdateWrites" : 5, > > > "numInserts" : 3, > > > "totalWriteBytes" : 848480, > > > "totalWriteErrors" : 0, > > > "tempPath" : null, > > > "partitionPath" : "2018/05/30", > > > "totalLogRecords" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogSizeCompacted" : 0, > > > "totalUpdatedRecordsCompacted" : 0, > > > "totalLogBlocks" : 0, > > > "totalCorruptLogBlock" : 0, > > > "totalRollbackBlocks" : 0 > > > } ], > > > "2018/05/31" : [ { > > > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956", > > > "path" : > "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet", > > > "prevCommit" : "null", > > > "numWrites" : 10430, > > > "numDeletes" : 0, > > > "numUpdateWrites" : 0, > > > "numInserts" : 10430, > > > "totalWriteBytes" : 820723, > > > "totalWriteErrors" : 0, > > > "tempPath" : null, > > > "partitionPath" : "2018/05/31", > > > "totalLogRecords" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogSizeCompacted" : 0, > > > "totalUpdatedRecordsCompacted" : 0, > > > "totalLogBlocks" : 0, > > > "totalCorruptLogBlock" : 0, > > > "totalRollbackBlocks" : 0 > > > } ] > > > }, > > > "compacted" : false, > > > "extraMetadataMap" : { > > > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" > : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : > \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" > : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n > \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n > \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : > \"commit\"\n}" > > > }, > > > "extraMetadata" : { > > > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" > : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : > \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" > : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n > \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n > \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n > \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : > \"commit\"\n}" > > > }, > > > "totalScanTime" : 0, > > > "totalCreateTime" : 2439, > > > "totalUpsertTime" : 2450, > > > "totalCompactedRecordsUpdated" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogFilesSize" : 0, > > > "fileIdAndRelativePaths" : { > > > "4f5514e8-d57c-4c6e-be8f-c3448051c956" : > "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet", > > > "39cff0df-24e4-45b8-bff5-9b4f41c4096a" : > "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet" > > > }, > > > "totalRecordsDeleted" : 0, > > > "totalLogRecordsCompacted" : 0 > > > } > > > > > > 20190702161750.clean: > > > Objavro.schema > {"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]} > > > > > > 20190702161847.inflight: > > > { > > > "partitionToWriteStats" : { > > > "2018/05/31" : [ { > > > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956", > > > "path" : null, > > > "prevCommit" : "20190702161629", > > > "numWrites" : 0, > > > "numDeletes" : 0, > > > "numUpdateWrites" : 2, > > > "numInserts" : 0, > > > "totalWriteBytes" : 0, > > > "totalWriteErrors" : 0, > > > "tempPath" : null, > > > "partitionPath" : null, > > > "totalLogRecords" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogSizeCompacted" : 0, > > > "totalUpdatedRecordsCompacted" : 0, > > > "totalLogBlocks" : 0, > > > "totalCorruptLogBlock" : 0, > > > "totalRollbackBlocks" : 0 > > > } ] > > > }, > > > "compacted" : false, > > > "extraMetadataMap" : { }, > > > "totalScanTime" : 0, > > > "totalCreateTime" : 0, > > > "totalUpsertTime" : 0, > > > "totalCompactedRecordsUpdated" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogFilesSize" : 0, > > > "extraMetadata" : { }, > > > "fileIdAndRelativePaths" : { > > > "4f5514e8-d57c-4c6e-be8f-c3448051c956" : null > > > }, > > > "totalRecordsDeleted" : 0, > > > "totalLogRecordsCompacted" : 0 > > > } > > > > > > 20190702162055.inflight: > > > { > > > "partitionToWriteStats" : { }, > > > "compacted" : false, > > > "extraMetadataMap" : { }, > > > "totalRecordsDeleted" : 0, > > > "totalLogRecordsCompacted" : 0, > > > "totalScanTime" : 0, > > > "totalCreateTime" : 0, > > > "totalUpsertTime" : 0, > > > "totalCompactedRecordsUpdated" : 0, > > > "totalLogFilesCompacted" : 0, > > > "totalLogFilesSize" : 0, > > > "fileIdAndRelativePaths" : { }, > > > "extraMetadata" : { } > > > } > > > > > > On Jul 3 2019, at 8:30 am, Kabeer Ahmed <kab...@linuxmail.org> wrote: > > > > Hi Nishith, > > > > > > > > Thank you for the quick respnose. I shall try to send the commit > metadata at the earliest. I hope the commit metadata you are looking for is > the one within .hoodie/ directory and not the ones that is archived. > > > > And there are inflight and commit metadata. I am taking that you > want to look into the one inflight. Shall revert back with further details. > > > > Thanks > > > > Kabeer. > > > > > > > > On Jul 3 2019, at 2:19 am, nishith agarwal <n3.nas...@gmail.com> > wrote: > > > > > Kabir, > > > > > > > > > > Could you share the content of your commit metadata ? You can list > the > > > > > timeline, find the latest commit in the timeline, perform a cat > and paste > > > > > the results (that you can share). > > > > > > > > > > Thanks, > > > > > Nishith > > > > > > > > > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <kab...@linuxmail.org> > wrote: > > > > > > Hi Vinoth and other HUDI Experts, > > > > > > I am stuck while processing inserts into HUDI. The process picks > up CSV > > > > > > files and loads them into HUDI. The process seems to be stuck at: > > > > > > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679 > > > > > > Log is below: > > > > > > > > > > > > 2019-07-02 22:43:31,875 [main] INFO > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => > > > > > > 9223372036854775807 > > > > > > 2019-07-02 22:43:31,969 [main] INFO > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath > : > > > > > > 2018/05/30 Small Files => [SmallFile > {location=HoodieRecordLocation > > > > > > {commitTime=20190702161750, > fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a}, > > > > > > sizeBytes=435362}] > > > > > > 2019-07-02 22:43:31,969 [main] INFO > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file > assignment: > > > > > > unassignedInserts => 8, totalInsertBuckets => 2147483647, > recordsPerBucket > > > > > > => 0 > > > > > > Looking at the last line in the log: "unassignedInserts => 8, > > > > > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this > causes the > > > > > > below code to loop for quite long causing heap issues. > > > > > > > > > > > > logger.info( > > > > > > "After small file assignment: unassignedInserts => " + > > > > > > totalUnassignedInserts > > > > > > + ", totalInsertBuckets => " + insertBuckets + ", > recordsPerBucket => " > > > > > > + insertRecordsPerBucket); > > > > > > for (int b = 0; b < insertBuckets; b++) { > > > > > > bucketNumbers.add(totalBuckets); > > > > > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets); > > > > > > BucketInfo bucketInfo = new BucketInfo(); > > > > > > bucketInfo.bucketType = BucketType.INSERT; > > > > > > bucketInfoMap.put(totalBuckets, bucketInfo); > > > > > > totalBuckets++; > > > > > > } > > > > > > Has someone seen the issue? Do I need to file a bug or it is > something to > > > > > > do with my misconfiguration? > > > > > > > > > > > > Any help is highly appreciated. > > > > > > Thanks > > > > > > Kabeer. > > > > > > > > > > > > > > > > > >