[GitHub] [hudi] rishabhbandi closed issue #6055: Hudi Partial Update not working by using MERGE statement on Hudi External Table

2022-11-03 Thread GitBox
rishabhbandi closed issue #6055: Hudi Partial Update not working by using MERGE statement on Hudi External Table URL: https://github.com/apache/hudi/issues/6055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] rishabhbandi commented on issue #6055: Hudi Partial Update not working by using MERGE statement on Hudi External Table

2022-11-03 Thread GitBox
rishabhbandi commented on issue #6055: URL: https://github.com/apache/hudi/issues/6055#issuecomment-1303007672 Hi Team, we created a separate custom java class to perform the partial update. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] hudi-bot commented on pull request #6983: [HUDI-5031]Hudi merge into creates empty partition files when the sou…

2022-11-03 Thread GitBox
hudi-bot commented on PR #6983: URL: https://github.com/apache/hudi/pull/6983#issuecomment-1302963014 ## CI report: * f5a8b04cb184f9c9f00961884c479856594f57f2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6983: [HUDI-5031]Hudi merge into creates empty partition files when the sou…

2022-11-03 Thread GitBox
hudi-bot commented on PR #6983: URL: https://github.com/apache/hudi/pull/6983#issuecomment-1302960236 ## CI report: * f5a8b04cb184f9c9f00961884c479856594f57f2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7063: [HUDI-5094] Remove partition fields before transform bytes to avro,if enable DROP_PARTITION_COLUMNS

2022-11-03 Thread GitBox
hudi-bot commented on PR #7063: URL: https://github.com/apache/hudi/pull/7063#issuecomment-1302957486 ## CI report: * 77487796a68b54304f55efc71097ab8ca50b428b UNKNOWN * 8240e1e8280cd8842d4ba11ef6f781feb3d8a9bd UNKNOWN * 85b70221d74d0d04900acda25e1ea9b7c71bcb0a UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #6725: [HUDI-4881] Push down filters if possible when syncing partitions to Hive

2022-11-03 Thread GitBox
hudi-bot commented on PR #6725: URL: https://github.com/apache/hudi/pull/6725#issuecomment-1302957195 ## CI report: * 81f856d99da09e5a9438fad2a0d111bc9062aba4 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-03 Thread GitBox
hudi-bot commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1302956460 ## CI report: * d690f80ac9cc19c3c97ded93381824bfdb6d7798 Azure:

[GitHub] [hudi] xiarixiaoyao commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-03 Thread GitBox
xiarixiaoyao commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1302942885 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] danny0405 commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

2022-11-03 Thread GitBox
danny0405 commented on issue #6019: URL: https://github.com/apache/hudi/issues/6019#issuecomment-1302935085 Yeah, let's close it out, use release 0.12.1 then if there are still problems, feel free to re-open it again ~ -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] danny0405 closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

2022-11-03 Thread GitBox
danny0405 closed issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢 URL: https://github.com/apache/hudi/issues/6019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [hudi] danny0405 commented on issue #6052: [SUPPORT] HoodieRollbackException when starting Flink Job on existing Hudi Table

2022-11-03 Thread GitBox
danny0405 commented on issue #6052: URL: https://github.com/apache/hudi/issues/6052#issuecomment-1302933977 Did you try the release 0.12.1 then ? It expects to work correctly now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] danny0405 commented on issue #5979: [SUPPORT]the hudi's table of join can not handle delete operation.But simple table is ok.why?

2022-11-03 Thread GitBox
danny0405 commented on issue #5979: URL: https://github.com/apache/hudi/issues/5979#issuecomment-1302933052 Table hudi C enables the changelog mode then ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] danny0405 commented on issue #4978: [SUPPORT] Wrong table path when using Hive to query xxx_rt table before the first compaction

2022-11-03 Thread GitBox
danny0405 commented on issue #4978: URL: https://github.com/apache/hudi/issues/4978#issuecomment-1302929509 No, we have not fixed it, the Hive/Trino all can not access file group with pure logs, can we move it to higher priority for release 0.13.0 and solve it then ? -- This is an

[jira] [Created] (HUDI-5159) Support write a success file to partition when it finished in flink streaming append writer

2022-11-03 Thread KevinyhZou (Jira)
KevinyhZou created HUDI-5159: Summary: Support write a success file to partition when it finished in flink streaming append writer Key: HUDI-5159 URL: https://issues.apache.org/jira/browse/HUDI-5159

[GitHub] [hudi] TengHuo commented on issue #7106: [PROPOSE] Add column prune support for other payload class

2022-11-03 Thread GitBox
TengHuo commented on issue #7106: URL: https://github.com/apache/hudi/issues/7106#issuecomment-1302908283 Attach RFC-46 link here: https://github.com/apache/hudi/blob/master/rfc/rfc-46/rfc-46.md -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] TengHuo commented on issue #7106: [PROPOSE] Add column prune support for other payload class

2022-11-03 Thread GitBox
TengHuo commented on issue #7106: URL: https://github.com/apache/hudi/issues/7106#issuecomment-1302906771 Sure, np. Thanks @nsivabalan Let me start a dev email thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] hudi-bot commented on pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
hudi-bot commented on PR #7129: URL: https://github.com/apache/hudi/pull/7129#issuecomment-1302906313 ## CI report: * e86e785602cfed876c75273a4c8a669f0143b77c Azure:

[GitHub] [hudi] nsivabalan closed issue #4864: Insert with INSERT_DROP_DUPS_OPT_KEY fails

2022-11-03 Thread GitBox
nsivabalan closed issue #4864: Insert with INSERT_DROP_DUPS_OPT_KEY fails URL: https://github.com/apache/hudi/issues/4864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] nsivabalan commented on issue #4864: Insert with INSERT_DROP_DUPS_OPT_KEY fails

2022-11-03 Thread GitBox
nsivabalan commented on issue #4864: URL: https://github.com/apache/hudi/issues/4864#issuecomment-1302904205 Since we haven't heard back from you for the past 6+ months going ahead and closing it out. feel free to reach out to us if you need further assistance. -- This is an

[GitHub] [hudi] nsivabalan commented on issue #4864: Insert with INSERT_DROP_DUPS_OPT_KEY fails

2022-11-03 Thread GitBox
nsivabalan commented on issue #4864: URL: https://github.com/apache/hudi/issues/4864#issuecomment-1302903706 Insert drop dups will consider file groups for matching partitions only. So, if you incoming batch contains records for 1 partition, hudi will do an index look up only in 1

[GitHub] [hudi] hudi-bot commented on pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
hudi-bot commented on PR #7129: URL: https://github.com/apache/hudi/pull/7129#issuecomment-1302903150 ## CI report: * 00ff1d41fae07715d44bc4a2551b76b1cb3eca1f Azure:

[GitHub] [hudi] nsivabalan commented on issue #4978: [SUPPORT] Wrong table path when using Hive to query xxx_rt table before the first compaction

2022-11-03 Thread GitBox
nsivabalan commented on issue #4978: URL: https://github.com/apache/hudi/issues/4978#issuecomment-1302901252 @danny0405 @xiarixiaoyao : do we know if we have fixed this anytime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #5036: [SUPPORT] AWS DMS and Deletes on S3 with Hudi

2022-11-03 Thread GitBox
nsivabalan commented on issue #5036: URL: https://github.com/apache/hudi/issues/5036#issuecomment-1302900811 @pratyakshsharma @jasondavindev : sorry. if you can explain the issue, I can try to see how I can help you here. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] nsivabalan closed issue #5083: [SUPPORT] Doing clustering for bulked insert table, could cause: Can't redefine: list

2022-11-03 Thread GitBox
nsivabalan closed issue #5083: [SUPPORT] Doing clustering for bulked insert table, could cause: Can't redefine: list URL: https://github.com/apache/hudi/issues/5083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan commented on issue #5083: [SUPPORT] Doing clustering for bulked insert table, could cause: Can't redefine: list

2022-11-03 Thread GitBox
nsivabalan commented on issue #5083: URL: https://github.com/apache/hudi/issues/5083#issuecomment-1302900460 thanks for the update @boneanxs . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] hudi-bot commented on pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
hudi-bot commented on PR #7129: URL: https://github.com/apache/hudi/pull/7129#issuecomment-1302900337 ## CI report: * 00ff1d41fae07715d44bc4a2551b76b1cb3eca1f Azure:

[GitHub] [hudi] nsivabalan commented on issue #5211: [SUPPORT] Glob pattern to pick specific subfolders not working while reading in Spark

2022-11-03 Thread GitBox
nsivabalan commented on issue #5211: URL: https://github.com/apache/hudi/issues/5211#issuecomment-1302899692 @kartik18 : any updates on this regard please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan commented on issue #5351: Hudi Write Performance

2022-11-03 Thread GitBox
nsivabalan commented on issue #5351: URL: https://github.com/apache/hudi/issues/5351#issuecomment-1302899438 @p-powell : for immutable use-cases, we recommend setting some configs to get better performance. https://hudi.apache.org/docs/performance#bulk-insert let us know if you

[GitHub] [hudi] china-shang opened a new issue, #7133: lazyReading affect

2022-11-03 Thread GitBox
china-shang opened a new issue, #7133: URL: https://github.com/apache/hudi/issues/7133 Why does lazyReading need to be turned on? It looks like he needs to seek back and forth. And need to keep the file open all the time,If you don't turn it on, you can always read forward? Is it to save

[GitHub] [hudi] nsivabalan commented on issue #5372: [SUPPORT] Compatible with multiple HBASE version or hbase: 2.1.0-cdh6.3.2

2022-11-03 Thread GitBox
nsivabalan commented on issue #5372: URL: https://github.com/apache/hudi/issues/5372#issuecomment-1302896768 hey @meitianjinbu : are we still looking for any assistance on this regard. btw, we added an FAQ on hbase conflicting w/ metadata table

[GitHub] [hudi] nsivabalan commented on issue #5481: [SUPPORT] Slow Upsert When Reloading Data into Hudi Table

2022-11-03 Thread GitBox
nsivabalan commented on issue #5481: URL: https://github.com/apache/hudi/issues/5481#issuecomment-1302889177 @MikeBuh : did you get a chance to try out the suggestions from Ethan above. let us know of any updates you have. would love to learn how the tuning went. -- This is an

[GitHub] [hudi] nsivabalan commented on issue #5482: [SUPPORT] metadata index fail with MOR tables

2022-11-03 Thread GitBox
nsivabalan commented on issue #5482: URL: https://github.com/apache/hudi/issues/5482#issuecomment-1302888669 if you are having other problems, can you help clarify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] nsivabalan commented on issue #5482: [SUPPORT] metadata index fail with MOR tables

2022-11-03 Thread GitBox
nsivabalan commented on issue #5482: URL: https://github.com/apache/hudi/issues/5482#issuecomment-1302888141 gist seems like s3 connection timeouts from connection pool. can you try bumping the connections. ``` --conf spark.hadoop.fs.s3a.connection.maximum=1000 ``` --

[GitHub] [hudi] nsivabalan commented on issue #5492: _hoodie_is_delete works differently on hudi spark datasource on docker compare to hudi on emr.

2022-11-03 Thread GitBox
nsivabalan commented on issue #5492: URL: https://github.com/apache/hudi/issues/5492#issuecomment-1302887335 @ashah-lightbox : gentle ping. any updates please. if you got the issue resolved, can we close it out. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] nsivabalan commented on issue #5514: [SUPPORT] Read optimized query on MOR table lists files without any Spark action

2022-11-03 Thread GitBox
nsivabalan commented on issue #5514: URL: https://github.com/apache/hudi/issues/5514#issuecomment-1302886962 closing this since we already landed the fix. Feel free to open a new issue if you are looking for further assistance. -- This is an automated message from the Apache Git

[GitHub] [hudi] nsivabalan closed issue #5514: [SUPPORT] Read optimized query on MOR table lists files without any Spark action

2022-11-03 Thread GitBox
nsivabalan closed issue #5514: [SUPPORT] Read optimized query on MOR table lists files without any Spark action URL: https://github.com/apache/hudi/issues/5514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan commented on issue #5519: [SUPPORT] Schema Evolution - Error with datatype promotion

2022-11-03 Thread GitBox
nsivabalan commented on issue #5519: URL: https://github.com/apache/hudi/issues/5519#issuecomment-1302886502 @xiarixiaoyao : can we follow up on this. did we get to reproduce and make any fix on this regard? @Zhangshunyu : if you don't use bulk_insert row writer path, are things ok ?

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

2022-11-03 Thread GitBox
nsivabalan commented on issue #5537: URL: https://github.com/apache/hudi/issues/5537#issuecomment-1302885962 @YannByron : looks like the author has given some hacky solution. Is there any enhancement we can add to hudi based on that. -- This is an automated message from the Apache

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

2022-11-03 Thread GitBox
nsivabalan commented on issue #5537: URL: https://github.com/apache/hudi/issues/5537#issuecomment-1302885402 @melin : gentle ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on issue #5539: [SUPPORT]Job aborted due to stage failure. Caused by: AvroTypeException: Invalid default for field CDC_TS: "null" not a ["null","string"]

2022-11-03 Thread GitBox
nsivabalan commented on issue #5539: URL: https://github.com/apache/hudi/issues/5539#issuecomment-1302885245 @nleena123 : would you mind closing the issue if are not looking for any further assistance. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] nsivabalan commented on issue #5673: [SUPPORT]scala.MatchError for rename colum

2022-11-03 Thread GitBox
nsivabalan commented on issue #5673: URL: https://github.com/apache/hudi/issues/5673#issuecomment-1302884770 @sunke38 : did you get a chance to give it a try. are we good to close it out. or is there anything you need more assistance. -- This is an automated message from the Apache

[GitHub] [hudi] nsivabalan commented on issue #5777: [SUPPORT] Hudi table has duplicate data.

2022-11-03 Thread GitBox
nsivabalan commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1302883337 @jiangjiguang : oops. sorry. @jjtjiang : can you respond when you can to my above comments. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] nsivabalan commented on issue #5826: Hudi table statistics is not proper by analyze table table compute statistics

2022-11-03 Thread GitBox
nsivabalan commented on issue #5826: URL: https://github.com/apache/hudi/issues/5826#issuecomment-1302882737 @minihippo : can we close if you can confirm that this is not an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] nsivabalan commented on issue #6055: Hudi Partial Update not working by using MERGE statement on Hudi External Table

2022-11-03 Thread GitBox
nsivabalan commented on issue #6055: URL: https://github.com/apache/hudi/issues/6055#issuecomment-1302881510 hey @rishabhbandi @hassan-ammar : were you folks able to resolve the issue. Did any fix go into hudi on this regard. can you guys help me understand is the issue still persists.

[GitHub] [hudi] xiarixiaoyao closed pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
xiarixiaoyao closed pull request #7129: [MINOR] Support column type evolution for Hive URL: https://github.com/apache/hudi/pull/7129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on issue #6019: [SUPPORT]如何让数据尽快的刷新到hudi中呢

2022-11-03 Thread GitBox
nsivabalan commented on issue #6019: URL: https://github.com/apache/hudi/issues/6019#issuecomment-1302878499 @yuzhaojing : can we follow up here please. If its already fixed in already released version of hudi, can we close it out. -- This is an automated message from the Apache Git

[GitHub] [hudi] nsivabalan commented on issue #6014: [SUPPORT] High runtime for a batch in SparkWriteHelper stage

2022-11-03 Thread GitBox
nsivabalan commented on issue #6014: URL: https://github.com/apache/hudi/issues/6014#issuecomment-1302878056 @veenaypatil : I see you are using non partitioned key gen. So, index look up is going to relative to the number of file groups you have in total. do you know whats total file

[GitHub] [hudi] nsivabalan commented on issue #5984: [SUPPORT] Error on GlobalSortPartitioner using 0.9.0

2022-11-03 Thread GitBox
nsivabalan commented on issue #5984: URL: https://github.com/apache/hudi/issues/5984#issuecomment-1302875447 @rubenssoto : hey hi. unless we get more info to reproduce, gonna be tough for us to make further investigation buddy. closing it due to no activity. OOM w/ global sort

[GitHub] [hudi] nsivabalan closed issue #5984: [SUPPORT] Error on GlobalSortPartitioner using 0.9.0

2022-11-03 Thread GitBox
nsivabalan closed issue #5984: [SUPPORT] Error on GlobalSortPartitioner using 0.9.0 URL: https://github.com/apache/hudi/issues/5984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on issue #5979: [SUPPORT]the hudi's table of join can not handle delete operation.But simple table is ok.why?

2022-11-03 Thread GitBox
nsivabalan commented on issue #5979: URL: https://github.com/apache/hudi/issues/5979#issuecomment-1302874631 @yuzhaojing @danny0405 : can we follow up on this issue. W/ latest CDC support, would the issue reported in this ticket will be solved? -- This is an automated message from the

[GitHub] [hudi] nsivabalan closed issue #5952: [SUPPORT] HudiDeltaStreamer S3EventSource SQS optimize for reading large number of files in parallel fashion

2022-11-03 Thread GitBox
nsivabalan closed issue #5952: [SUPPORT] HudiDeltaStreamer S3EventSource SQS optimize for reading large number of files in parallel fashion URL: https://github.com/apache/hudi/issues/5952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #5952: [SUPPORT] HudiDeltaStreamer S3EventSource SQS optimize for reading large number of files in parallel fashion

2022-11-03 Thread GitBox
nsivabalan commented on issue #5952: URL: https://github.com/apache/hudi/issues/5952#issuecomment-1302873977 Since we have a patch addressing the proposed fix, closing out the issue. Feel free to reach out to us if you need any further assistance. -- This is an automated message from

[GitHub] [hudi] nsivabalan commented on issue #6052: [SUPPORT] HoodieRollbackException when starting Flink Job on existing Hudi Table

2022-11-03 Thread GitBox
nsivabalan commented on issue #6052: URL: https://github.com/apache/hudi/issues/6052#issuecomment-1302872944 @shqiprimbkodelabs @danny0405 : are we good to close this one or is there anything pending still ? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] nsivabalan closed issue #6048: [SUPPORT] S3 throttling while loading a table written with "hoodie.metadata.enable" = true

2022-11-03 Thread GitBox
nsivabalan closed issue #6048: [SUPPORT] S3 throttling while loading a table written with "hoodie.metadata.enable" = true URL: https://github.com/apache/hudi/issues/6048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] nsivabalan commented on issue #6048: [SUPPORT] S3 throttling while loading a table written with "hoodie.metadata.enable" = true

2022-11-03 Thread GitBox
nsivabalan commented on issue #6048: URL: https://github.com/apache/hudi/issues/6048#issuecomment-1302872527 @noahtaite : going ahead and closing this one for now. Feel free to raise a new issue if you are looking for further assistance. -- This is an automated message from the Apache

[GitHub] [hudi] nsivabalan closed issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-11-03 Thread GitBox
nsivabalan closed issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient URL: https://github.com/apache/hudi/issues/6038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] nsivabalan commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-11-03 Thread GitBox
nsivabalan commented on issue #6038: URL: https://github.com/apache/hudi/issues/6038#issuecomment-1302872105 feel free to raise a new issue if you are looking for further enhancement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #7049: [SUPPORT] SQLQueryBasedTransformer Not writing transformed parquet data

2022-11-03 Thread GitBox
nsivabalan commented on issue #7049: URL: https://github.com/apache/hudi/issues/7049#issuecomment-1302871037 thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] nsivabalan closed issue #7049: [SUPPORT] SQLQueryBasedTransformer Not writing transformed parquet data

2022-11-03 Thread GitBox
nsivabalan closed issue #7049: [SUPPORT] SQLQueryBasedTransformer Not writing transformed parquet data URL: https://github.com/apache/hudi/issues/7049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] fsilent commented on a diff in pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
fsilent commented on code in PR #7129: URL: https://github.com/apache/hudi/pull/7129#discussion_r1013541122 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java: ## @@ -45,7 +46,8 @@ */ @UseRecordReaderFromInputFormat

[GitHub] [hudi] fsilent commented on a diff in pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
fsilent commented on code in PR #7129: URL: https://github.com/apache/hudi/pull/7129#discussion_r1013540860 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -222,6 +222,20 @@ test + + Review Comment: changed. now dont need to add hive

[GitHub] [hudi] YannByron commented on pull request #7128: [HUDI-5153] Remove the optional wildcard from FSUtils#LOG_FILE_PATTERN

2022-11-03 Thread GitBox
YannByron commented on PR #7128: URL: https://github.com/apache/hudi/pull/7128#issuecomment-1302868990 basically, It's not related to cdc. that https://github.com/apache/hudi/pull/7042 can work without any other changes. This pr should just think about whether the `?` optional wildcard

[GitHub] [hudi] hudi-bot commented on pull request #7063: [HUDI-5094] Remove partition fields before transform bytes to avro,if enable DROP_PARTITION_COLUMNS

2022-11-03 Thread GitBox
hudi-bot commented on PR #7063: URL: https://github.com/apache/hudi/pull/7063#issuecomment-1302867429 ## CI report: * 77487796a68b54304f55efc71097ab8ca50b428b UNKNOWN * 8240e1e8280cd8842d4ba11ef6f781feb3d8a9bd UNKNOWN * 85b70221d74d0d04900acda25e1ea9b7c71bcb0a UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #7063: [HUDI-5094] Remove partition fields before transform bytes to avro,if enable DROP_PARTITION_COLUMNS

2022-11-03 Thread GitBox
hudi-bot commented on PR #7063: URL: https://github.com/apache/hudi/pull/7063#issuecomment-1302864990 ## CI report: * 77487796a68b54304f55efc71097ab8ca50b428b UNKNOWN * 8240e1e8280cd8842d4ba11ef6f781feb3d8a9bd UNKNOWN * 85b70221d74d0d04900acda25e1ea9b7c71bcb0a UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #6725: [HUDI-4881] Push down filters if possible when syncing partitions to Hive

2022-11-03 Thread GitBox
hudi-bot commented on PR #6725: URL: https://github.com/apache/hudi/pull/6725#issuecomment-1302864792 ## CI report: * 81f856d99da09e5a9438fad2a0d111bc9062aba4 Azure:

[GitHub] [hudi] boneanxs commented on pull request #6725: [HUDI-4881] Push down filters if possible when syncing partitions to Hive

2022-11-03 Thread GitBox
boneanxs commented on PR #6725: URL: https://github.com/apache/hudi/pull/6725#issuecomment-1302863626 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] boneanxs commented on pull request #6725: [HUDI-4881] Push down filters if possible when syncing partitions to Hive

2022-11-03 Thread GitBox
boneanxs commented on PR #6725: URL: https://github.com/apache/hudi/pull/6725#issuecomment-1302863524 @alexeykudinkin @xushiyan could you please review the new commit? The test failure is 137, not relate to this pr. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] fsilent commented on a diff in pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
fsilent commented on code in PR #7129: URL: https://github.com/apache/hudi/pull/7129#discussion_r1013522502 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -222,6 +222,20 @@ test + + Review Comment: because support column type evolution for

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
xiarixiaoyao commented on code in PR #7129: URL: https://github.com/apache/hudi/pull/7129#discussion_r1013530563 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java: ## @@ -45,7 +46,8 @@ */ @UseRecordReaderFromInputFormat

[GitHub] [hudi] fsilent commented on a diff in pull request #7129: [MINOR] Support column type evolution for Hive

2022-11-03 Thread GitBox
fsilent commented on code in PR #7129: URL: https://github.com/apache/hudi/pull/7129#discussion_r1013522502 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -222,6 +222,20 @@ test + + Review Comment: because support column type evolution for

[GitHub] [hudi] hudi-bot commented on pull request #7132: [HUDI-51577] Adding capability to remove all meta fields from source hudi table with Hudi incr source

2022-11-03 Thread GitBox
hudi-bot commented on PR #7132: URL: https://github.com/apache/hudi/pull/7132#issuecomment-1302822386 ## CI report: * 23edfddd3ba7aff627930cf60fbca8255c3b40d4 Azure:

[GitHub] [hudi] nsivabalan commented on issue #7106: [PROPOSE] Add column prune support for other payload class

2022-11-03 Thread GitBox
nsivabalan commented on issue #7106: URL: https://github.com/apache/hudi/issues/7106#issuecomment-1302809739 https://issues.apache.org/jira/browse/HUDI-5158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Created] (HUDI-5158) Add column pruning support to any payload

2022-11-03 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5158: - Summary: Add column pruning support to any payload Key: HUDI-5158 URL: https://issues.apache.org/jira/browse/HUDI-5158 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-5064) Improve docs around concurrency control and deployment models

2022-11-03 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5064: Fix Version/s: 0.13.0 > Improve docs around concurrency control and deployment models >

[jira] [Updated] (HUDI-5064) Improve docs around concurrency control and deployment models

2022-11-03 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5064: Description: Currently, the concurrency control-related configurations for different deployment models are

[jira] [Updated] (HUDI-5064) Improve docs around concurrency control and deployment models

2022-11-03 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5064: Component/s: docs > Improve docs around concurrency control and deployment models >

[GitHub] [hudi] nsivabalan commented on issue #7060: Error when upgrading to hudi 0.12.0 from 0.9.0

2022-11-03 Thread GitBox
nsivabalan commented on issue #7060: URL: https://github.com/apache/hudi/issues/7060#issuecomment-1302808470 @navbalaraman : gentle ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on issue #7062: [SUPPORT] Appeding to files during UPSERT causes executors to die due to memory issues.

2022-11-03 Thread GitBox
nsivabalan commented on issue #7062: URL: https://github.com/apache/hudi/issues/7062#issuecomment-1302808274 also, I don't get this statement of yours "We noticed that the class HoodieMergeHandle is not being used due to PARQUET_SMALL_FILE_LIMIT = 0 and the job passes successfully.". can

[GitHub] [hudi] nsivabalan commented on issue #7062: [SUPPORT] Appeding to files during UPSERT causes executors to die due to memory issues.

2022-11-03 Thread GitBox
nsivabalan commented on issue #7062: URL: https://github.com/apache/hudi/issues/7062#issuecomment-1302807726 hey @HEPBO3AH : do you mean to say that, even after our fix https://github.com/apache/hudi/pull/6864, your avg record size estimate is wrong in some cases. And as a result your are

[GitHub] [hudi] nsivabalan commented on issue #7102: [SUPPORT] FileNotFoundException when read from mor table

2022-11-03 Thread GitBox
nsivabalan commented on issue #7102: URL: https://github.com/apache/hudi/issues/7102#issuecomment-1302804138 Do you still have the ".hoodie" w/ your old state when you ran into the exception. We can inspect the timeline (".hoodie") to see what was the issue. if not, we can't do much now.

[GitHub] [hudi] nsivabalan commented on issue #7102: [SUPPORT] FileNotFoundException when read from mor table

2022-11-03 Thread GitBox
nsivabalan commented on issue #7102: URL: https://github.com/apache/hudi/issues/7102#issuecomment-1302803751 guess there is some mis-understanding. as of now, you ran into issues while writing to hudi table by enabling metadata table. still on the read path (hive), your read should

[GitHub] [hudi] slfan1989 commented on pull request #7127: [HUDI-5154] Improve hudi-spark-client Lambada writing

2022-11-03 Thread GitBox
slfan1989 commented on PR #7127: URL: https://github.com/apache/hudi/pull/7127#issuecomment-1302781439 > LGTM. @slfan1989 could you check the CI failures? @yanghua Thanks a lot for your help reviewing the code, I will check the CI failures. -- This is an automated message from the

[GitHub] [hudi] lewyh commented on issue #7130: [SUPPORT] `HoodieMetadataException` started occurring when writing to COW table

2022-11-03 Thread GitBox
lewyh commented on issue #7130: URL: https://github.com/apache/hudi/issues/7130#issuecomment-1302781431 It seems that setting the config value `.set("spark.hadoop.fs.s3.maxConnections", "1000")` fixes the problem. There is no longer any server 500 error, or timeout waiting for connection

[GitHub] [hudi] hudi-bot commented on pull request #7132: [HUDI-51577] Adding capability to remove all meta fields from source hudi table with Hudi incr source

2022-11-03 Thread GitBox
hudi-bot commented on PR #7132: URL: https://github.com/apache/hudi/pull/7132#issuecomment-1302779770 ## CI report: * 23edfddd3ba7aff627930cf60fbca8255c3b40d4 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7039: [HUDI-5080] Fixing unpersist to consider only rdds pertaining to current write operation

2022-11-03 Thread GitBox
hudi-bot commented on PR #7039: URL: https://github.com/apache/hudi/pull/7039#issuecomment-1302779619 ## CI report: * 5ff96812e74f348af76c942f58e67445afbb765e Azure:

[hudi] branch master updated: [HUDI-5126] Delete duplicate configuration items PAYLOAD_CLASS_NAME (#7103)

2022-11-03 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d36cc05ed5 [HUDI-5126] Delete duplicate

[GitHub] [hudi] nsivabalan merged pull request #7103: [HUDI-5126] Delete duplicate configuration items PAYLOAD_CLASS_NAME

2022-11-03 Thread GitBox
nsivabalan merged PR #7103: URL: https://github.com/apache/hudi/pull/7103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] nsivabalan commented on issue #7106: [PROPOSE] Add column prune support for other payload class

2022-11-03 Thread GitBox
nsivabalan commented on issue #7106: URL: https://github.com/apache/hudi/issues/7106#issuecomment-1302755348 this sounds interesting. We have RFC-46 nearing landing. So, might have to replay this on top of RFC-46. But can you start a dev email thread. and we can go from there. Def

[GitHub] [hudi] nsivabalan closed issue #7106: [PROPOSE] Add column prune support for other payload class

2022-11-03 Thread GitBox
nsivabalan closed issue #7106: [PROPOSE] Add column prune support for other payload class URL: https://github.com/apache/hudi/issues/7106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on issue #7116: [SUPPORT]The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

2022-11-03 Thread GitBox
nsivabalan commented on issue #7116: URL: https://github.com/apache/hudi/issues/7116#issuecomment-1302753334 @yuzhaojing @danny0405 : Can you folks follow up when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] nsivabalan commented on issue #7122: [SUPPORT] Archiving the commits failing for Hudi job

2022-11-03 Thread GitBox
nsivabalan commented on issue #7122: URL: https://github.com/apache/hudi/issues/7122#issuecomment-1302751446 Looks like this is a older version of hudi. with 0.12.0, I am not seeing the class named HoodieTimelineArchiveLog. Or are you using an internal hudi version that you maintain

[GitHub] [hudi] nsivabalan commented on issue #7064: [SUPPORT] Data ingestion from csv file i.e. CsvDFSSource is working for FilebasedSchemaProvider but not working if schema is provided with SchemaRe

2022-11-03 Thread GitBox
nsivabalan commented on issue #7064: URL: https://github.com/apache/hudi/issues/7064#issuecomment-1302739800 I don't see any issue just from gleaning the code. Can you post us the info logs you see in both cases for below statement. I am expecting its same for both(schema from file

[GitHub] [hudi] lewyh commented on issue #7130: [SUPPORT] `HoodieMetadataException` started occurring when writing to COW table

2022-11-03 Thread GitBox
lewyh commented on issue #7130: URL: https://github.com/apache/hudi/issues/7130#issuecomment-1302738763 Thanks for the quick response. I've tried setting the following when intializing the spark session: ``` conf = ( SparkConf() .setAppName(app_name)

[GitHub] [hudi] hudi-bot commented on pull request #7132: [HUDI-51577] Adding capability to remove all meta fields from source hudi table with Hudi incr source

2022-11-03 Thread GitBox
hudi-bot commented on PR #7132: URL: https://github.com/apache/hudi/pull/7132#issuecomment-1302735029 ## CI report: * 23edfddd3ba7aff627930cf60fbca8255c3b40d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5157) Duplicate partition path for chained hudi tables.

2022-11-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5157: -- Story Points: 2 > Duplicate partition path for chained hudi tables. >

[jira] [Updated] (HUDI-5157) Duplicate partition path for chained hudi tables.

2022-11-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5157: -- Sprint: 2022/11/01 > Duplicate partition path for chained hudi tables. >

[GitHub] [hudi] nsivabalan commented on issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-11-03 Thread GitBox
nsivabalan commented on issue #5189: URL: https://github.com/apache/hudi/issues/5189#issuecomment-1302727444 https://github.com/apache/hudi/pull/7132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] nsivabalan opened a new pull request, #7132: [HUDI-51577] Adding capability to remove all meta fields from source hudi table with Hudi incr source

2022-11-03 Thread GitBox
nsivabalan opened a new pull request, #7132: URL: https://github.com/apache/hudi/pull/7132 ### Change Logs HoodieIncrSource was dropping every meta field from source except partition path. This was resulting in duplicate meta field (_hoodie_partition_path) when reading the 2nd table

[GitHub] [hudi] nsivabalan closed issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-11-03 Thread GitBox
nsivabalan closed issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column URL: https://github.com/apache/hudi/issues/5189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-11-03 Thread GitBox
nsivabalan commented on issue #5189: URL: https://github.com/apache/hudi/issues/5189#issuecomment-1302725013 https://issues.apache.org/jira/browse/HUDI-5157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

  1   2   3   >