I can help you out. :) Created
https://issues.apache.org/jira/browse/HUDI-164 to track this.
Please share your jira ID and I will assign it to you. :)

We can write a simple loop that looks for the first non-zero size commit or
fallback to default configs.

On Wed, Jul 3, 2019 at 8:35 AM Kabeer Ahmed <kab...@linuxmail.org> wrote:

> Hi Balaji,
>
> My confidence isnt great when it comes to edit the code to find the newest
> non zero instant. So I would earnestly request someone who has worked on
> this before to grab a look. It might be efficient for someone knowledgeable
> around this code to add a fix rather than someone like me. (I would
> honestly like to work on the fix if someone is willing to do hand holding
> :) ).
> Thanks,
> On Jul 3 2019, at 4:32 pm, Balaji Varadarajan <v.bal...@ymail.com.INVALID>
> wrote:
> > Thanks for finding the old issue. Looks like we replied around the same
> time :) Yeah, makes sense. The fix would probably be finding the newest
> instant with non-zero records written and then using it for average record
> calculation. Let us know if you are interested in working on the fix.
> > Balaji.V
> > On Wednesday, July 3, 2019, 8:25:33 AM PDT, Kabeer Ahmed <
> kab...@linuxmail.org> wrote:
> >
> > Hi Nishith and All,
> > I think I figured out what was blocking the processing of files for me.
> The code snippet that I had sent before is indeed the issue. I have raised
> a new issue and added a few details at:
> https://github.com/apache/incubator-hudi/issues/776 (
> https://link.getmailspring.com/link/79fa058a-3683-40e0-be09-0a43ac4cb...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F776&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > Can someone please have a look and advise what is going wrong?
> >
> > Thank you,
> > Kabeer.
> >
> > On Jul 3 2019, at 11:38 am, Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > > Hi Nishith,
> > >
> > > Please find the latest commit data in the gist at:
> https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d (
> https://link.getmailspring.com/link/79fa058a-3683-40e0-be09-0a43ac4cb...@getmailspring.com/1?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > > For your convenience, I have also copy pasted it below. Any help is
> highly appreciated.
> > >
> > > Thanks
> > > Kabeer.
> > >
> > > 20190702161629.commit:
> > > {
> > > "partitionToWriteStats" : {
> > > "2018/05/30" : [ {
> > > "fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a",
> > > "path" :
> "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet",
> > > "prevCommit" : "20190702161417",
> > > "numWrites" : 11614,
> > > "numDeletes" : 0,
> > > "numUpdateWrites" : 5,
> > > "numInserts" : 3,
> > > "totalWriteBytes" : 848480,
> > > "totalWriteErrors" : 0,
> > > "tempPath" : null,
> > > "partitionPath" : "2018/05/30",
> > > "totalLogRecords" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogSizeCompacted" : 0,
> > > "totalUpdatedRecordsCompacted" : 0,
> > > "totalLogBlocks" : 0,
> > > "totalCorruptLogBlock" : 0,
> > > "totalRollbackBlocks" : 0
> > > } ],
> > > "2018/05/31" : [ {
> > > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> > > "path" :
> "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> > > "prevCommit" : "null",
> > > "numWrites" : 10430,
> > > "numDeletes" : 0,
> > > "numUpdateWrites" : 0,
> > > "numInserts" : 10430,
> > > "totalWriteBytes" : 820723,
> > > "totalWriteErrors" : 0,
> > > "tempPath" : null,
> > > "partitionPath" : "2018/05/31",
> > > "totalLogRecords" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogSizeCompacted" : 0,
> > > "totalUpdatedRecordsCompacted" : 0,
> > > "totalLogBlocks" : 0,
> > > "totalCorruptLogBlock" : 0,
> > > "totalRollbackBlocks" : 0
> > > } ]
> > > },
> > > "compacted" : false,
> > > "extraMetadataMap" : {
> > > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\"
> : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" :
> \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\"
> : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" :
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n
> \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" :
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n
> \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" :
> \"commit\"\n}"
> > > },
> > > "extraMetadata" : {
> > > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\"
> : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" :
> \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\"
> : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" :
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n
> \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" :
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n
> \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" :
> \"commit\"\n}"
> > > },
> > > "totalScanTime" : 0,
> > > "totalCreateTime" : 2439,
> > > "totalUpsertTime" : 2450,
> > > "totalCompactedRecordsUpdated" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogFilesSize" : 0,
> > > "fileIdAndRelativePaths" : {
> > > "4f5514e8-d57c-4c6e-be8f-c3448051c956" :
> "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> > > "39cff0df-24e4-45b8-bff5-9b4f41c4096a" :
> "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet"
> > > },
> > > "totalRecordsDeleted" : 0,
> > > "totalLogRecordsCompacted" : 0
> > > }
> > >
> > > 20190702161750.clean:
> > > Objavro.schema
> {"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}
> > >
> > > 20190702161847.inflight:
> > > {
> > > "partitionToWriteStats" : {
> > > "2018/05/31" : [ {
> > > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> > > "path" : null,
> > > "prevCommit" : "20190702161629",
> > > "numWrites" : 0,
> > > "numDeletes" : 0,
> > > "numUpdateWrites" : 2,
> > > "numInserts" : 0,
> > > "totalWriteBytes" : 0,
> > > "totalWriteErrors" : 0,
> > > "tempPath" : null,
> > > "partitionPath" : null,
> > > "totalLogRecords" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogSizeCompacted" : 0,
> > > "totalUpdatedRecordsCompacted" : 0,
> > > "totalLogBlocks" : 0,
> > > "totalCorruptLogBlock" : 0,
> > > "totalRollbackBlocks" : 0
> > > } ]
> > > },
> > > "compacted" : false,
> > > "extraMetadataMap" : { },
> > > "totalScanTime" : 0,
> > > "totalCreateTime" : 0,
> > > "totalUpsertTime" : 0,
> > > "totalCompactedRecordsUpdated" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogFilesSize" : 0,
> > > "extraMetadata" : { },
> > > "fileIdAndRelativePaths" : {
> > > "4f5514e8-d57c-4c6e-be8f-c3448051c956" : null
> > > },
> > > "totalRecordsDeleted" : 0,
> > > "totalLogRecordsCompacted" : 0
> > > }
> > >
> > > 20190702162055.inflight:
> > > {
> > > "partitionToWriteStats" : { },
> > > "compacted" : false,
> > > "extraMetadataMap" : { },
> > > "totalRecordsDeleted" : 0,
> > > "totalLogRecordsCompacted" : 0,
> > > "totalScanTime" : 0,
> > > "totalCreateTime" : 0,
> > > "totalUpsertTime" : 0,
> > > "totalCompactedRecordsUpdated" : 0,
> > > "totalLogFilesCompacted" : 0,
> > > "totalLogFilesSize" : 0,
> > > "fileIdAndRelativePaths" : { },
> > > "extraMetadata" : { }
> > > }
> > >
> > > On Jul 3 2019, at 8:30 am, Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > > > Hi Nishith,
> > > >
> > > > Thank you for the quick respnose. I shall try to send the commit
> metadata at the earliest. I hope the commit metadata you are looking for is
> the one within .hoodie/ directory and not the ones that is archived.
> > > > And there are inflight and commit metadata. I am taking that you
> want to look into the one inflight. Shall revert back with further details.
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > On Jul 3 2019, at 2:19 am, nishith agarwal <n3.nas...@gmail.com>
> wrote:
> > > > > Kabir,
> > > > >
> > > > > Could you share the content of your commit metadata ? You can list
> the
> > > > > timeline, find the latest commit in the timeline, perform a cat
> and paste
> > > > > the results (that you can share).
> > > > >
> > > > > Thanks,
> > > > > Nishith
> > > > >
> > > > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <kab...@linuxmail.org>
> wrote:
> > > > > > Hi Vinoth and other HUDI Experts,
> > > > > > I am stuck while processing inserts into HUDI. The process picks
> up CSV
> > > > > > files and loads them into HUDI. The process seems to be stuck at:
> > > > > >
> https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
> > > > > > Log is below:
> > > > > >
> > > > > > 2019-07-02 22:43:31,875 [main] INFO
> > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize =>
> > > > > > 9223372036854775807
> > > > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath
> :
> > > > > > 2018/05/30 Small Files => [SmallFile
> {location=HoodieRecordLocation
> > > > > > {commitTime=20190702161750,
> fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a},
> > > > > > sizeBytes=435362}]
> > > > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file
> assignment:
> > > > > > unassignedInserts => 8, totalInsertBuckets => 2147483647,
> recordsPerBucket
> > > > > > => 0
> > > > > > Looking at the last line in the log: "unassignedInserts => 8,
> > > > > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this
> causes the
> > > > > > below code to loop for quite long causing heap issues.
> > > > > >
> > > > > > logger.info(
> > > > > > "After small file assignment: unassignedInserts => " +
> > > > > > totalUnassignedInserts
> > > > > > + ", totalInsertBuckets => " + insertBuckets + ",
> recordsPerBucket => "
> > > > > > + insertRecordsPerBucket);
> > > > > > for (int b = 0; b < insertBuckets; b++) {
> > > > > > bucketNumbers.add(totalBuckets);
> > > > > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
> > > > > > BucketInfo bucketInfo = new BucketInfo();
> > > > > > bucketInfo.bucketType = BucketType.INSERT;
> > > > > > bucketInfoMap.put(totalBuckets, bucketInfo);
> > > > > > totalBuckets++;
> > > > > > }
> > > > > > Has someone seen the issue? Do I need to file a bug or it is
> something to
> > > > > > do with my misconfiguration?
> > > > > >
> > > > > > Any help is highly appreciated.
> > > > > > Thanks
> > > > > > Kabeer.
> > > > >
> > > >
> > >
> >
> >
>
>

Reply via email to