Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-11 Thread via GitHub
vinishjail97 commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105950888 > hey @vinishjail97 : can you attach the memory profileing you did before and after this patch. and rebase w/ master. we are good to go 15th March: Basic OOM Test (Consume

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub
yihua merged PR #10872: URL: https://github.com/apache/hudi/pull/10872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105462852 ## CI report: * acbabdc64da321e77aaabd03bcd9d5f3c322c0ec Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105414177 ## CI report: * ac7713c64afa1d2406463c8563a065362c95ecda Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105411384 ## CI report: * ac7713c64afa1d2406463c8563a065362c95ecda Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub
yihua commented on code in PR #10872: URL: https://github.com/apache/hudi/pull/10872#discussion_r1597306796 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java: ## @@ -57,6 +57,8 @@ */ public class JsonKafkaSource extends KafkaSource { +

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-09 Thread via GitHub
nsivabalan commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2103497489 hey @vinishjail97 : can you attach the memory profileing you did before and after this patch. and rebase w/ master. we are good to go -- This is an automated message from the Apache

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-02 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2092180728 ## CI report: * ac7713c64afa1d2406463c8563a065362c95ecda Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-02 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2092056890 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-02 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2092052549 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-18 Thread via GitHub
CTTY commented on code in PR #10872: URL: https://github.com/apache/hudi/pull/10872#discussion_r1529666064 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java: ## @@ -57,6 +57,8 @@ */ public class JsonKafkaSource extends KafkaSource { +

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-18 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2005124255 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-18 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2004996708 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-18 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2004984687 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-16 Thread via GitHub
codope commented on code in PR #10872: URL: https://github.com/apache/hudi/pull/10872#discussion_r1527412766 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java: ## @@ -124,4 +123,4 @@ private JavaRDD postProcess(JavaRDD jsonStringRDD) {

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-15 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2000368965 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-15 Thread via GitHub
hudi-bot commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2000362084 ## CI report: * 629e91bc0267c0728b98326eb84072965c600205 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-03-15 Thread via GitHub
vinishjail97 opened a new pull request, #10872: URL: https://github.com/apache/hudi/pull/10872 ### Change Logs This block of code is problematic and can lead to OOM when we are we converting the iterator into a list and then returning the iterator back. This just holds up memory in