[GitHub] [hudi] leesf commented on a change in pull request #3729: Support JuiceFileSystem

2021-09-28 Thread GitBox
leesf commented on a change in pull request #3729: URL: https://github.com/apache/hudi/pull/3729#discussion_r718152109 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java ## @@ -62,6 +62,8 @@ OBS("obs", false), // Kingsoft Standard

[GitHub] [hudi] stym06 commented on issue #2688: [SUPPORT] Sync to Hive using Metastore

2021-09-28 Thread GitBox
stym06 commented on issue #2688: URL: https://github.com/apache/hudi/issues/2688#issuecomment-929828308 hi, i made it to work with Hive 3.1.2 after importing some jars into the classpath after finding out the classes not found (majorly calcite, datanucleus) -- This is an automated

[GitHub] [hudi] leesf commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
leesf commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r718137137 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[GitHub] [hudi] fengjian428 commented on pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-28 Thread GitBox
fengjian428 commented on pull request #3671: URL: https://github.com/apache/hudi/pull/3671#issuecomment-929816532 > @fengjian428 Check CI again? what you mean? the checks below all passed -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] hudi-bot edited a comment on pull request #3729: Support JuiceFileSystem

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3729: URL: https://github.com/apache/hudi/pull/3729#issuecomment-929788240 ## CI report: * c66e724bfa44a5147c8965bb585d07ac3b36 Azure:

[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
danny0405 commented on a change in pull request #3203: URL: https://github.com/apache/hudi/pull/3203#discussion_r718116742 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] hudi-bot edited a comment on pull request #3729: Support JuiceFileSystem

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3729: URL: https://github.com/apache/hudi/pull/3729#issuecomment-929788240 ## CI report: * c66e724bfa44a5147c8965bb585d07ac3b36 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #3729: Support JuiceFileSystem

2021-09-28 Thread GitBox
hudi-bot commented on pull request #3729: URL: https://github.com/apache/hudi/pull/3729#issuecomment-929788240 ## CI report: * c66e724bfa44a5147c8965bb585d07ac3b36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] yanghua commented on pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-28 Thread GitBox
yanghua commented on pull request #3671: URL: https://github.com/apache/hudi/pull/3671#issuecomment-929787041 @fengjian428 Check CI again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] tangyoupeng opened a new pull request #3729: Support JuiceFileSystem

2021-09-28 Thread GitBox
tangyoupeng opened a new pull request #3729: URL: https://github.com/apache/hudi/pull/3729 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the

[GitHub] [hudi] danny0405 commented on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
danny0405 commented on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-929775609 Changes the title and commit message to "[HUDI-2086] Redo the logic of mor incremental view for hive" -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] danny0405 commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
danny0405 commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r718104690 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[GitHub] [hudi] danny0405 commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
danny0405 commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r718104393 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[GitHub] [hudi] danny0405 commented on a change in pull request #3637: [HUDI-2301] Support Flink async compaction scheduling

2021-09-28 Thread GitBox
danny0405 commented on a change in pull request #3637: URL: https://github.com/apache/hudi/pull/3637#discussion_r718092117 ## File path: hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java ## @@ -486,8 +486,8 @@ private FlinkOptions() { public static

[GitHub] [hudi] danny0405 commented on a change in pull request #3637: [HUDI-2301] Support Flink async compaction scheduling

2021-09-28 Thread GitBox
danny0405 commented on a change in pull request #3637: URL: https://github.com/apache/hudi/pull/3637#discussion_r718091146 ## File path: hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java ## @@ -430,14 +430,14 @@ private FlinkOptions() { public static

[jira] [Updated] (HUDI-2450) Make Flink MOR table writing streaming friendly

2021-09-28 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-2450: - Summary: Make Flink MOR table writing streaming friendly (was: Make Flink MOR table writer in a

[GitHub] [hudi] danny0405 commented on issue #3728: [SUPPORT] Hudi Flink S3 Java Example

2021-09-28 Thread GitBox
danny0405 commented on issue #3728: URL: https://github.com/apache/hudi/issues/3728#issuecomment-929754237 What version of Hudi do you plan to use, there are some SQL examples here:

[GitHub] [hudi] nsivabalan merged pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-28 Thread GitBox
nsivabalan merged pull request #3698: URL: https://github.com/apache/hudi/pull/3698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] xushiyan commented on a change in pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
xushiyan commented on a change in pull request #3725: URL: https://github.com/apache/hudi/pull/3725#discussion_r717208114 ## File path: website/contribute/rfc-process.md ## @@ -0,0 +1,56 @@ +--- +sidebar_position: 3 +title: "RFC Process" +toc: true +last_modified_at:

[GitHub] [hudi] t0il3ts0ap edited a comment on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-09-28 Thread GitBox
t0il3ts0ap edited a comment on issue #2934: URL: https://github.com/apache/hudi/issues/2934#issuecomment-928866105 I have also worked on the same changes already. @vingov, @jsbali Let me know, if you have not yet started then I can raise a pr for this. -- This is an automated message

[GitHub] [hudi] nsivabalan commented on issue #3313: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key

2021-09-28 Thread GitBox
nsivabalan commented on issue #3313: URL: https://github.com/apache/hudi/issues/3313#issuecomment-928290740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [hudi] xushiyan commented on issue #3662: [SUPPORT] Error on the spark version in the desc information of the hudi CTAS Table

2021-09-28 Thread GitBox
xushiyan commented on issue #3662: URL: https://github.com/apache/hudi/issues/3662#issuecomment-928965831 @kelvin-qin i can't reproduce this. CTAS from this UT gave the correct info. `org.apache.spark.sql.hudi.TestHoodieSqlBase#test("Test Create Table As Select")` ```scala

[GitHub] [hudi] xushiyan commented on issue #3624: Failed to delete the partition table record

2021-09-28 Thread GitBox
xushiyan commented on issue #3624: URL: https://github.com/apache/hudi/issues/3624#issuecomment-928837505 @melin could you provide more version info like Hudi, Spark versions. also if it's on cloud or other environment? > The reason for this error is because a partition was

[GitHub] [hudi] xushiyan closed issue #3709: [SUPPORT] insert operation does not consistently insert duplicate records

2021-09-28 Thread GitBox
xushiyan closed issue #3709: URL: https://github.com/apache/hudi/issues/3709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-28 Thread GitBox
xiarixiaoyao commented on a change in pull request #3668: URL: https://github.com/apache/hudi/pull/3668#discussion_r717191016 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -213,6 +224,27 @@ protected

[GitHub] [hudi] t0il3ts0ap commented on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-09-28 Thread GitBox
t0il3ts0ap commented on issue #2934: URL: https://github.com/apache/hudi/issues/2934#issuecomment-928866105 I have also worked on the same changes already. @jsbali Let me know, if you have not yet started then I can raise a pr for this. -- This is an automated message from the Apache

[GitHub] [hudi] bryanburke commented on issue #3641: [SUPPORT] Retrieving latest completed commit timestamp via HoodieTableMetaClient in PySpark

2021-09-28 Thread GitBox
bryanburke commented on issue #3641: URL: https://github.com/apache/hudi/issues/3641#issuecomment-929262475 @xushiyan Thank you for your response! I had no idea Hudi provides event-driven features, so your suggestions are helping me learn quite a bit more about the framework. While I do

[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] YannByron commented on a change in pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-28 Thread GitBox
YannByron commented on a change in pull request #3693: URL: https://github.com/apache/hudi/pull/3693#discussion_r717206741 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/PartitionPathEncodeUtils.java ## @@ -71,7 +73,7 @@ public static String

[GitHub] [hudi] xushiyan commented on a change in pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-28 Thread GitBox
xushiyan commented on a change in pull request #3413: URL: https://github.com/apache/hudi/pull/3413#discussion_r717267707 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/UtilitiesTestBase.java ## @@ -314,6 +320,28 @@ public static void

[GitHub] [hudi] hudi-bot edited a comment on pull request #3727: [HUDI-2497] Refactor clean, restore, and compaction actions in hudi-client module

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3727: URL: https://github.com/apache/hudi/pull/3727#issuecomment-928920877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] jsbali commented on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-09-28 Thread GitBox
jsbali commented on issue #2934: URL: https://github.com/apache/hudi/issues/2934#issuecomment-928877634 @t0il3ts0ap Do raise the PR for the same if it is not too much work and we can let the Hudi folks decide what makes sense for Hudi. My changes are mostly in IncrementalRelation. Will

[GitHub] [hudi] zhangyue19921010 commented on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-28 Thread GitBox
zhangyue19921010 commented on pull request #3413: URL: https://github.com/apache/hudi/pull/3413#issuecomment-928881572 Hi @nsivabalan @vinothchandar. Thanks a lot for your attention, review and approve! Could we land it or what else do I need to do? :) -- This is an automated message

[GitHub] [hudi] xiarixiaoyao commented on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
xiarixiaoyao commented on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-928890183 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] bvaradar commented on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-28 Thread GitBox
bvaradar commented on pull request #3668: URL: https://github.com/apache/hudi/pull/3668#issuecomment-928824868 @xiarixiaoyao : Can you add commits to this PR instead of squashing. It makes things easy for us to find the delta changes. We can do final squash before landing the PR. --

[GitHub] [hudi] yanghua merged pull request #3715: [HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka

2021-09-28 Thread GitBox
yanghua merged pull request #3715: URL: https://github.com/apache/hudi/pull/3715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] bvaradar commented on a change in pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-28 Thread GitBox
bvaradar commented on a change in pull request #3668: URL: https://github.com/apache/hudi/pull/3668#discussion_r717230330 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -213,6 +224,27 @@ protected void

[GitHub] [hudi] fuyun2024 removed a comment on pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-28 Thread GitBox
fuyun2024 removed a comment on pull request #3722: URL: https://github.com/apache/hudi/pull/3722#issuecomment-928586896 I'm sorry, I don't know this mistake. Who can give me some advice? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-28 Thread GitBox
zhangyue19921010 removed a comment on pull request #3413: URL: https://github.com/apache/hudi/pull/3413#issuecomment-913312467 Or could we land it if possible? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-28 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-929282472 @FelixKJose : what I meant is, you are good w/ your configs in general. just that for every commit only one small file will be packed w/ more inserts. rest of incoming records will

[GitHub] [hudi] vinothchandar commented on pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar commented on pull request #3725: URL: https://github.com/apache/hudi/pull/3725#issuecomment-929246892 I will fix the other comments and repush. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] vinothchandar merged pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
vinothchandar merged pull request #3726: URL: https://github.com/apache/hudi/pull/3726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] codope commented on issue #3607: [SUPPORT]Presto query hudi data with metadata table enable un-successfully.

2021-09-28 Thread GitBox
codope commented on issue #3607: URL: https://github.com/apache/hudi/issues/3607#issuecomment-929111244 > Hi @codope Thanks a lot for your attention. I just tried but failed again :( Sincerely ask : > > 1. What's the version of pesto you used? > 2. `in docker setup with metadata

[GitHub] [hudi] nsivabalan commented on issue #3533: [SUPPORT]How to use MOR Table to Merge small file?

2021-09-28 Thread GitBox
nsivabalan commented on issue #3533: URL: https://github.com/apache/hudi/issues/3533#issuecomment-928269532 @aresa7796 : if we dig in more if you can provide us w/ more info like file sizes, etc. As of now, we can't debug much without that info. appreciate if you can respond w/ details.

[GitHub] [hudi] nsivabalan commented on a change in pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-28 Thread GitBox
nsivabalan commented on a change in pull request #3413: URL: https://github.com/apache/hudi/pull/3413#discussion_r717168485 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1398,6 +1399,34 @@ private void

[GitHub] [hudi] yanghua commented on a change in pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-28 Thread GitBox
yanghua commented on a change in pull request #3671: URL: https://github.com/apache/hudi/pull/3671#discussion_r717386107 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHiveSchemaProvider.java ## @@ -0,0 +1,124 @@ +/* + * Licensed to the

[GitHub] [hudi] nsivabalan commented on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-28 Thread GitBox
nsivabalan commented on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-928195283 @hudi-bot azure run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] xushiyan commented on pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
xushiyan commented on pull request #3726: URL: https://github.com/apache/hudi/pull/3726#issuecomment-928184950 @vinothchandar RFCs should be kept in asf-site? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] vinothchandar merged pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar merged pull request #3725: URL: https://github.com/apache/hudi/pull/3725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] xushiyan edited a comment on pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
xushiyan edited a comment on pull request #3726: URL: https://github.com/apache/hudi/pull/3726#issuecomment-928184950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] danny0405 commented on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
danny0405 commented on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-928786962 Thanks, i will take a look tomorrow :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] fuyun2024 commented on pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-28 Thread GitBox
fuyun2024 commented on pull request #3722: URL: https://github.com/apache/hudi/pull/3722#issuecomment-928586896 I'm sorry, I don't know this mistake. Who can give me some advice? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] fengjian428 commented on a change in pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-28 Thread GitBox
fengjian428 commented on a change in pull request #3671: URL: https://github.com/apache/hudi/pull/3671#discussion_r717431276 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHiveSchemaProvider.java ## @@ -0,0 +1,124 @@ +/* + * Licensed to

[GitHub] [hudi] vinothchandar commented on a change in pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar commented on a change in pull request #3725: URL: https://github.com/apache/hudi/pull/3725#discussion_r717604324 ## File path: website/contribute/rfc-process.md ## @@ -0,0 +1,56 @@ +--- +sidebar_position: 3 +title: "RFC Process" +toc: true +last_modified_at:

[GitHub] [hudi] leesf commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
leesf commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r717684115 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[GitHub] [hudi] xushiyan commented on issue #3647: [SUPPORT] Failed to read parquet file during upsert

2021-09-28 Thread GitBox
xushiyan commented on issue #3647: URL: https://github.com/apache/hudi/issues/3647#issuecomment-928978500 > In my case, this issue can be workaround to use insert operation instead of bulk-insert. > Should not Hudi bulk-insert and insert operations be consistent in what they use and

[GitHub] [hudi] niloo-sh commented on issue #3572: compatble version of hudi, hive and hadoop

2021-09-28 Thread GitBox
niloo-sh commented on issue #3572: URL: https://github.com/apache/hudi/issues/3572#issuecomment-928903882 @vinothchandar thank you, I added dependency to build.sbt file and I created fat jar and run it with spark-submit. now I get exception:"org/json/jsonexception", its correct, I use

[GitHub] [hudi] qianchutao commented on pull request #3715: [HUDI-2487] fix JsonKafkaSource cannot filter empty messages from kafka

2021-09-28 Thread GitBox
qianchutao commented on pull request #3715: URL: https://github.com/apache/hudi/pull/3715#issuecomment-928605496 @yanghua Mr. Yang, I have made corresponding modifications and added the unit test part. Please help me review it again ,thanks -- This is an automated message from the

[GitHub] [hudi] zhangyue19921010 commented on pull request #3666: [HUDI-2435][BUG]Fix clustering handle errors

2021-09-28 Thread GitBox
zhangyue19921010 commented on pull request #3666: URL: https://github.com/apache/hudi/pull/3666#issuecomment-928882260 Hi @satishkotha. Thanks a lot for your attention, review and approve! Could we land it or what else do I need to do? :) -- This is an automated message from the Apache

[GitHub] [hudi] nsivabalan commented on pull request #3715: [HUDI-2487] fix JsonKafkaSource cannot filter empty messages from kafka

2021-09-28 Thread GitBox
nsivabalan commented on pull request #3715: URL: https://github.com/apache/hudi/pull/3715#issuecomment-928632193 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] nsivabalan commented on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-28 Thread GitBox
nsivabalan commented on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-928235206 @hudi-bot azure run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] xushiyan commented on issue #3709: [SUPPORT] insert operation does not consistently insert duplicate records

2021-09-28 Thread GitBox
xushiyan commented on issue #3709: URL: https://github.com/apache/hudi/issues/3709#issuecomment-928732681 JIRA filed https://issues.apache.org/jira/browse/HUDI-2496 and we'll prioritize a fix. Thanks again @helanto -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] vinothchandar commented on pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
vinothchandar commented on pull request #3726: URL: https://github.com/apache/hudi/pull/3726#issuecomment-929234706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] mkk1490 commented on issue #3313: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key

2021-09-28 Thread GitBox
mkk1490 commented on issue #3313: URL: https://github.com/apache/hudi/issues/3313#issuecomment-928853775 > @mkk1490 : sorry the issue got lengthy and I have got a couple of clarifications. > Is your issue is with record key fields having one component as timestamp or is it

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3413: URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] nsivabalan commented on pull request #2691: [HUDI-1703] Fixing kafka auto.reset.offsets config param key

2021-09-28 Thread GitBox
nsivabalan commented on pull request #2691: URL: https://github.com/apache/hudi/pull/2691#issuecomment-928636419 @YannByron : yes. it was a oversight that we landed this. later this got fixed. so with 090, you can use "auto.offset.reset". -- This is an automated message from the Apache

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-09-28 Thread GitBox
xiarixiaoyao commented on a change in pull request #3203: URL: https://github.com/apache/hudi/pull/3203#discussion_r717259013 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java ## @@ -73,6 +73,14 @@ protected HoodieDefaultTimeline

[GitHub] [hudi] nsivabalan edited a comment on pull request #3715: [HUDI-2487] fix JsonKafkaSource cannot filter empty messages from kafka

2021-09-28 Thread GitBox
nsivabalan edited a comment on pull request #3715: URL: https://github.com/apache/hudi/pull/3715#issuecomment-928632193 LGTM. will let @yanghua land this in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan closed issue #3313: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key

2021-09-28 Thread GitBox
nsivabalan closed issue #3313: URL: https://github.com/apache/hudi/issues/3313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #3727: [HUDI-2497] Refactor clean, restore, and compaction actions in hudi-client module

2021-09-28 Thread GitBox
hudi-bot commented on pull request #3727: URL: https://github.com/apache/hudi/pull/3727#issuecomment-928920877 ## CI report: * 8f37aaed5a8a14fcad20eb2e761f05c0e1dfa4b0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

2021-09-28 Thread GitBox
codope commented on issue #3713: URL: https://github.com/apache/hudi/issues/3713#issuecomment-929284948 @calleo The hive sync tool does not lock. I mean, internallt HoodiHiveClient does initiate the connection but that gets closed as soon as sync completes. I think in docker setup, derby

[GitHub] [hudi] nsivabalan closed pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-28 Thread GitBox
nsivabalan closed pull request #3716: URL: https://github.com/apache/hudi/pull/3716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-09-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421584#comment-17421584 ] Raymond Xu commented on HUDI-2496: -- [~helias_an] Sure. assigned! Please ping us even with a draft PR, we

[jira] [Assigned] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-09-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-2496: Assignee: Helias Antoniou > Inserts are precombined even with dedup disabled >

[jira] [Updated] (HUDI-2494) Fix usage of different key generators with metadata enabled

2021-09-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2494: -- Description: With [sync metadata patch|https://github.com/apache/hudi/pull/3590/], when

[jira] [Updated] (HUDI-2494) Fix usage of different key generators with metadata enabled

2021-09-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2494: -- Description: With sync metadata patch, when metadata is enabled by default, some spark

[GitHub] [hudi] hudi-bot edited a comment on pull request #3727: [HUDI-2497] Refactor clean, restore, and compaction actions in hudi-client module

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3727: URL: https://github.com/apache/hudi/pull/3727#issuecomment-928920877 ## CI report: * 10081946cf12f3c7116ee5192ef0a44709b19021 Azure:

[GitHub] [hudi] nsivabalan closed pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-28 Thread GitBox
nsivabalan closed pull request #3716: URL: https://github.com/apache/hudi/pull/3716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-09-28 Thread Helias Antoniou (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421570#comment-17421570 ] Helias Antoniou commented on HUDI-2496: --- Hey [~xushiyan] [~codope] , can I work on this one ? >

[GitHub] [hudi] hudi-bot edited a comment on pull request #3727: [HUDI-2497] Refactor clean, restore, and compaction actions in hudi-client module

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3727: URL: https://github.com/apache/hudi/pull/3727#issuecomment-928920877 ## CI report: * 8f37aaed5a8a14fcad20eb2e761f05c0e1dfa4b0 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3727: [HUDI-2497] Refactor clean, restore, and compaction actions in hudi-client module

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3727: URL: https://github.com/apache/hudi/pull/3727#issuecomment-928920877 ## CI report: * 8f37aaed5a8a14fcad20eb2e761f05c0e1dfa4b0 Azure:

[hudi] branch master updated (f0585fa -> 2aa660f)

2021-09-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from f0585fa [HUDI-2474] Refreshing timeline for every operation in Hudi when metadata is enabled (#3698) add

[GitHub] [hudi] vinothchandar merged pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
vinothchandar merged pull request #3726: URL: https://github.com/apache/hudi/pull/3726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch asf-site updated: [DOCS] New RFC Process (#3725)

2021-09-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 7129c5a [DOCS] New RFC Process (#3725)

[GitHub] [hudi] vinothchandar merged pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar merged pull request #3725: URL: https://github.com/apache/hudi/pull/3725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] xushiyan commented on a change in pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
xushiyan commented on a change in pull request #3725: URL: https://github.com/apache/hudi/pull/3725#discussion_r717743688 ## File path: website/contribute/rfc-process.md ## @@ -0,0 +1,56 @@ +--- +sidebar_position: 3 +title: "RFC Process" +toc: true +last_modified_at:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * c09f5e30662efd882ff9fc35f5750a8253171e9d Azure:

[GitHub] [hudi] leesf commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
leesf commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r717684115 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[GitHub] [hudi] leesf commented on a change in pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-28 Thread GitBox
leesf commented on a change in pull request #3719: URL: https://github.com/apache/hudi/pull/3719#discussion_r717684115 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java ## @@ -175,8 +181,12 @@ public boolean accept(Path path) {

[jira] [Created] (HUDI-2498) Support Hive sync to work with s3

2021-09-28 Thread Vinay (Jira)
Vinay created HUDI-2498: --- Summary: Support Hive sync to work with s3 Key: HUDI-2498 URL: https://issues.apache.org/jira/browse/HUDI-2498 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-28 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * 87d2ec0cc60ef8d5b7a081b73a0318aedf1a2a90 Azure:

[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

2021-09-28 Thread GitBox
codope commented on issue #3713: URL: https://github.com/apache/hudi/issues/3713#issuecomment-929284948 @calleo The hive sync tool does not lock. I mean, internallt HoodiHiveClient does initiate the connection but that gets closed as soon as sync completes. I think in docker setup, derby

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-28 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-929282472 @FelixKJose : what I meant is, you are good w/ your configs in general. just that for every commit only one small file will be packed w/ more inserts. rest of incoming records will

[GitHub] [hudi] vinothchandar commented on pull request #3726: [MINOR] Add a RFC template and folder

2021-09-28 Thread GitBox
vinothchandar commented on pull request #3726: URL: https://github.com/apache/hudi/pull/3726#issuecomment-929271177 Addressed all feedback on other PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] vinothchandar commented on a change in pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar commented on a change in pull request #3725: URL: https://github.com/apache/hudi/pull/3725#discussion_r717619101 ## File path: website/contribute/rfc-process.md ## @@ -0,0 +1,56 @@ +--- +sidebar_position: 3 +title: "RFC Process" +toc: true +last_modified_at:

[GitHub] [hudi] bryanburke commented on issue #3641: [SUPPORT] Retrieving latest completed commit timestamp via HoodieTableMetaClient in PySpark

2021-09-28 Thread GitBox
bryanburke commented on issue #3641: URL: https://github.com/apache/hudi/issues/3641#issuecomment-929262475 @xushiyan Thank you for your response! I had no idea Hudi provides event-driven features, so your suggestions are helping me learn quite a bit more about the framework. While I do

[GitHub] [hudi] vinothchandar commented on a change in pull request #3725: [DOCS] New RFC Process

2021-09-28 Thread GitBox
vinothchandar commented on a change in pull request #3725: URL: https://github.com/apache/hudi/pull/3725#discussion_r717613251 ## File path: website/contribute/rfc-process.md ## @@ -0,0 +1,56 @@ +--- +sidebar_position: 3 +title: "RFC Process" +toc: true +last_modified_at:

  1   2   >