[GitHub] [hudi] xushiyan commented on issue #4055: [SUPPORT] Hudi with SqlQueryBasedTransformer fails-> spark error exit 134 or exit 143 in "isEmpty at DeltaSync.java:344" : Container from a bad node

2021-11-21 Thread GitBox
xushiyan commented on issue #4055: URL: https://github.com/apache/hudi/issues/4055#issuecomment-974699184 @JB-data > --hoodie-conf hoodie.deltastreamer.transformer.sql='SELECT data[0].id as id FROM ' Is this actually valid SQL? The error you showed is not really the root c

[GitHub] [hudi] yapnel commented on issue #4058: Support for record level point lookup

2021-11-21 Thread GitBox
yapnel commented on issue #4058: URL: https://github.com/apache/hudi/issues/4058#issuecomment-974826710 > it should work, but may not be performant. We are looking to add record level index and some work is in progress on this end. You can follow the work [here](https://github.com/apache/h

[GitHub] [hudi] hudi-bot commented on pull request #4057: [HUDI-2392] Make flink parquet reader compatible with decimal BINARY …

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4057: URL: https://github.com/apache/hudi/pull/4057#issuecomment-974680874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] xushiyan edited a comment on issue #3831: Deltastreamer through Pyspark/livy

2021-11-21 Thread GitBox
xushiyan edited a comment on issue #3831: URL: https://github.com/apache/hudi/issues/3831#issuecomment-974689954 @Kavin88 so basically it is a feature request where you wish to pass these deltastreamer-specific arguments via options. It is indeed possible to create a set of configs say `ho

[GitHub] [hudi] hudi-bot commented on pull request #4059: [HUDI-2813] Claim RFC number for RFC for spark datasource V2 Integration

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4059: URL: https://github.com/apache/hudi/pull/4059#issuecomment-974743082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] leesf commented on pull request #4012: [HUDI-2777] Data import performance deteriorates because multiple Spark jobs are started when data is written to disks.

2021-11-21 Thread GitBox
leesf commented on pull request #4012: URL: https://github.com/apache/hudi/pull/4012#issuecomment-974744663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [hudi] hudi-bot commented on pull request #4060: [HUDI-2814] Addressing issues w/ Z-order Layout Optimization

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4060: URL: https://github.com/apache/hudi/pull/4060#issuecomment-974745161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] hudi-bot commented on pull request #4056: [HUDI-2808] Supports deduplication for streaming write

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4056: URL: https://github.com/apache/hudi/pull/4056#issuecomment-974662521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] danny0405 commented on issue #4030: [SUPPORT] Flink uses updated fields to update data

2021-11-21 Thread GitBox
danny0405 commented on issue #4030: URL: https://github.com/apache/hudi/issues/4030#issuecomment-974749801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[GitHub] [hudi] leesf commented on pull request #4061: [MINOR] Fix RocketMQ logo in landing page

2021-11-21 Thread GitBox
leesf commented on pull request #4061: URL: https://github.com/apache/hudi/pull/4061#issuecomment-974791691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [hudi] xushiyan closed issue #4001: [SUPPORT] Hudi 0.9.0 fails when used with Spark 3.2.0

2021-11-21 Thread GitBox
xushiyan closed issue #4001: URL: https://github.com/apache/hudi/issues/4001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] xushiyan closed issue #4008: [SUPPORT] Hudi failed to sync new partition table to glue data catalog

2021-11-21 Thread GitBox
xushiyan closed issue #4008: URL: https://github.com/apache/hudi/issues/4008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] hudi-bot removed a comment on pull request #4012: [HUDI-2777] Data import performance deteriorates because multiple Spark jobs are started when data is written to disks.

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4012: URL: https://github.com/apache/hudi/pull/4012#issuecomment-973663414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] hudi-bot commented on pull request #4053: [MINOR] Fix typos

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4053: URL: https://github.com/apache/hudi/pull/4053#issuecomment-974615211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] hudi-bot removed a comment on pull request #3053: [HUDI-1932] Update Hive sync timestamp when change detected

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #3053: URL: https://github.com/apache/hudi/pull/3053#issuecomment-962706537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] vinothchandar commented on pull request #4061: [MINOR] Fix RocketMQ logo in landing page

2021-11-21 Thread GitBox
vinothchandar commented on pull request #4061: URL: https://github.com/apache/hudi/pull/4061#issuecomment-974858404 lgtm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] Limess commented on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data after col

2021-11-21 Thread GitBox
Limess commented on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 `_hoodie_is_deleted` is the added column, sorry I misquoted it above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] hudi-bot removed a comment on pull request #4054: [MINOR] Optimize imports and delete useless or duplicate imports

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4054: URL: https://github.com/apache/hudi/pull/4054#issuecomment-974617278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] leesf merged pull request #4053: [MINOR] Fix typos

2021-11-21 Thread GitBox
leesf merged pull request #4053: URL: https://github.com/apache/hudi/pull/4053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...

[GitHub] [hudi] mtami edited a comment on issue #4008: [SUPPORT] Hudi failed to sync new partition table to glue data catalog

2021-11-21 Thread GitBox
mtami edited a comment on issue #4008: URL: https://github.com/apache/hudi/issues/4008#issuecomment-974716726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [hudi] nsivabalan commented on issue #4031: [SUPPORT] _hoodie_is_deleted should work with any truthy value

2021-11-21 Thread GitBox
nsivabalan commented on issue #4031: URL: https://github.com/apache/hudi/issues/4031#issuecomment-974820517 Let me know if I understand your question correctly. - You are seeing a behavior where when "_hoodie_is_deleted" is set to null or false, hudi persist this column on storage. And

[GitHub] [hudi] xushiyan commented on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
xushiyan commented on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974608947 @nikita-sheremet-clearscale could you share the hudi configs you used for this job? e.g. bulk insert or upsert? COW or MOR table? any config you set manually can be helpful. Fro

[GitHub] [hudi] hudi-bot commented on pull request #4012: [HUDI-2777] Data import performance deteriorates because multiple Spark jobs are started when data is written to disks.

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4012: URL: https://github.com/apache/hudi/pull/4012#issuecomment-974744735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] nsivabalan commented on a change in pull request #4046: [HUDI-2527] Multi writer test with conflicting async table services

2021-11-21 Thread GitBox
nsivabalan commented on a change in pull request #4046: URL: https://github.com/apache/hudi/pull/4046#discussion_r753804804 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestHoodieClientMultiWriter.java ## @@ -228,78 +231,88 @@ private void

[GitHub] [hudi] xushiyan commented on issue #4001: [SUPPORT] Hudi 0.9.0 fails when used with Spark 3.2.0

2021-11-21 Thread GitBox
xushiyan commented on issue #4001: URL: https://github.com/apache/hudi/issues/4001#issuecomment-974691214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [hudi] xushiyan commented on issue #3831: Deltastreamer through Pyspark/livy

2021-11-21 Thread GitBox
xushiyan commented on issue #3831: URL: https://github.com/apache/hudi/issues/3831#issuecomment-974689954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [hudi] xushiyan commented on a change in pull request #3289: [HUDI-2187] Add a shim layer to support multiple hive version

2021-11-21 Thread GitBox
xushiyan commented on a change in pull request #3289: URL: https://github.com/apache/hudi/pull/3289#discussion_r753740519 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -120,6 +120,9 @@ @Parameter(names = {"--spark-schema

[GitHub] [hudi] leesf commented on a change in pull request #4018: Add the GooseFS integration document

2021-11-21 Thread GitBox
leesf commented on a change in pull request #4018: URL: https://github.com/apache/hudi/pull/4018#discussion_r753736855 ## File path: website/docs/goosefs_hoodie.md ## @@ -0,0 +1,46 @@ +--- +title: GooseFS Filesystem +keywords: [ hudi, hive, tencent, goosefs, spark, presto] +sum

[GitHub] [hudi] leesf commented on pull request #3967: [HUDI-2767] Enabling timeline server based marker as default

2021-11-21 Thread GitBox
leesf commented on pull request #3967: URL: https://github.com/apache/hudi/pull/3967#issuecomment-974777352 > @vinothchandar : I am thinking, for users who explicitly disable timeline server, should we fallback to using direct style markers? For backward compatibility, I think we should

[GitHub] [hudi] hudi-bot commented on pull request #4048: [HUDI-1290] [RFC-39] Deltastreamer avro source for Debezium CDC

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4048: URL: https://github.com/apache/hudi/pull/4048#issuecomment-974594750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-21 Thread GitBox
xiarixiaoyao commented on a change in pull request #4013: URL: https://github.com/apache/hudi/pull/4013#discussion_r753757069 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieColumnRangeMetadata.java ## @@ -30,16 +28,21 @@ private final String colu

[GitHub] [hudi] hudi-bot removed a comment on pull request #4060: [HUDI-2814] Addressing issues w/ Z-order Layout Optimization

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4060: URL: https://github.com/apache/hudi/pull/4060#issuecomment-974745161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] xiarixiaoyao commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-21 Thread GitBox
xiarixiaoyao commented on pull request #4013: URL: https://github.com/apache/hudi/pull/4013#issuecomment-974766598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4013: URL: https://github.com/apache/hudi/pull/4013#issuecomment-974767804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] hudi-bot commented on pull request #3053: [HUDI-1932] Update Hive sync timestamp when change detected

2021-11-21 Thread GitBox
hudi-bot commented on pull request #3053: URL: https://github.com/apache/hudi/pull/3053#issuecomment-974726950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] hudi-bot removed a comment on pull request #4056: [HUDI-2808] Supports deduplication for streaming write

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4056: URL: https://github.com/apache/hudi/pull/4056#issuecomment-974662521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] hudi-bot removed a comment on pull request #4057: [HUDI-2392] Make flink parquet reader compatible with decimal BINARY …

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4057: URL: https://github.com/apache/hudi/pull/4057#issuecomment-974680874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] vinothchandar commented on a change in pull request #4038: [HUDI-2795] Add mechanism to safely update,delete and recover table properties

2021-11-21 Thread GitBox
vinothchandar commented on a change in pull request #4038: URL: https://github.com/apache/hudi/pull/4038#discussion_r753688167 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -191,16 +194,103 @@ public HoodieTableConfig(FileSys

[GitHub] [hudi] hudi-bot removed a comment on pull request #4048: [HUDI-1290] [RFC-39] Deltastreamer avro source for Debezium CDC

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4048: URL: https://github.com/apache/hudi/pull/4048#issuecomment-974524321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] hudi-bot removed a comment on pull request #4051: [HUDI-2804] Add option to skip compaction instants for streaming read

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4051: URL: https://github.com/apache/hudi/pull/4051#issuecomment-974584778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] vinothchandar commented on a change in pull request #4046: [HUDI-2527] Multi writer test with conflicting async table services

2021-11-21 Thread GitBox
vinothchandar commented on a change in pull request #4046: URL: https://github.com/apache/hudi/pull/4046#discussion_r753816359 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestHoodieClientMultiWriter.java ## @@ -228,78 +231,88 @@ private voi

[GitHub] [hudi] hudi-bot commented on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot commented on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974763042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] leesf merged pull request #4047: Claim RFC number for RFC for debezium source for deltastreamer

2021-11-21 Thread GitBox
leesf merged pull request #4047: URL: https://github.com/apache/hudi/pull/4047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...

[GitHub] [hudi] YannByron commented on a change in pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
YannByron commented on a change in pull request #3998: URL: https://github.com/apache/hudi/pull/3998#discussion_r753751992 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala ## @@ -0,0 +1,291 @@ +/* + *

[GitHub] [hudi] danny0405 merged pull request #4057: [HUDI-2392] Make flink parquet reader compatible with decimal BINARY …

2021-11-21 Thread GitBox
danny0405 merged pull request #4057: URL: https://github.com/apache/hudi/pull/4057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubsc

[GitHub] [hudi] kywe665 commented on pull request #4052: [WIP] - [HUDI-2805] - Docs for HoodieCleaner

2021-11-21 Thread GitBox
kywe665 commented on pull request #4052: URL: https://github.com/apache/hudi/pull/4052#issuecomment-974608013 I will review this doc and create implementation examples with @bhasudha next week. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] vinothchandar merged pull request #4038: [HUDI-2795] Add mechanism to safely update,delete and recover table properties

2021-11-21 Thread GitBox
vinothchandar merged pull request #4038: URL: https://github.com/apache/hudi/pull/4038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-uns

[GitHub] [hudi] dongkelun commented on a change in pull request #4053: [MINOR] Fix typos

2021-11-21 Thread GitBox
dongkelun commented on a change in pull request #4053: URL: https://github.com/apache/hudi/pull/4053#discussion_r753745898 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HiveIncrementalPuller.java ## @@ -106,14 +106,14 @@ private Connection connectio

[GitHub] [hudi] leesf merged pull request #4040: [MINOR] optimize in constructor of inputbatch class

2021-11-21 Thread GitBox
leesf merged pull request #4040: URL: https://github.com/apache/hudi/pull/4040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...

[GitHub] [hudi] xushiyan commented on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-11-21 Thread GitBox
xushiyan commented on issue #4027: URL: https://github.com/apache/hudi/issues/4027#issuecomment-974701743 @liujinhui1994 > Starting clustering for a group, parallelism:0 commit:20211102011441. This comes from Clustering plan which was created from the replace commit metadata.

[GitHub] [hudi] nikita-sheremet-clearscale commented on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
nikita-sheremet-clearscale commented on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] codope merged pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-21 Thread GitBox
codope merged pull request #4025: URL: https://github.com/apache/hudi/pull/4025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr..

[GitHub] [hudi] hudi-bot removed a comment on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-968575437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] codope merged pull request #3053: [HUDI-1932] Update Hive sync timestamp when change detected

2021-11-21 Thread GitBox
codope merged pull request #3053: URL: https://github.com/apache/hudi/pull/3053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr..

[GitHub] [hudi] manojpec commented on pull request #4045: [HUDI-2472] Enabling metadata table for TestHoodieIndex

2021-11-21 Thread GitBox
manojpec commented on pull request #4045: URL: https://github.com/apache/hudi/pull/4045#issuecomment-974719909 @nsivabalan CI passed for the failed job in the re-run - https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=3528&view=results -- This is an autom

[GitHub] [hudi] xushiyan commented on issue #4008: [SUPPORT] Hudi failed to sync new partition table to glue data catalog

2021-11-21 Thread GitBox
xushiyan commented on issue #4008: URL: https://github.com/apache/hudi/issues/4008#issuecomment-974699893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [hudi] codecov-commenter removed a comment on pull request #3289: [HUDI-2187] Add a shim layer to support multiple hive version

2021-11-21 Thread GitBox
codecov-commenter removed a comment on pull request #3289: URL: https://github.com/apache/hudi/pull/3289#issuecomment-881904382 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3289?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Th

[GitHub] [hudi] hudi-bot commented on pull request #4045: [HUDI-2472] Enabling metadata table for TestHoodieIndex

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4045: URL: https://github.com/apache/hudi/pull/4045#issuecomment-974599914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] hudi-bot removed a comment on pull request #4053: [MINOR] Fix typos

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4053: URL: https://github.com/apache/hudi/pull/4053#issuecomment-974615211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] xiarixiaoyao commented on pull request #4026: [HUDI-2788] Fixing issues w/ Z-order Layout Optimization

2021-11-21 Thread GitBox
xiarixiaoyao commented on pull request #4026: URL: https://github.com/apache/hudi/pull/4026#issuecomment-974796357 @alexeykudinkin great work. thanks very much. I'll take a closer look tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [hudi] xushiyan commented on issue #3933: [SUPPORT] Large amount of disk spill on initial upsert/bulk insert

2021-11-21 Thread GitBox
xushiyan commented on issue #3933: URL: https://github.com/apache/hudi/issues/3933#issuecomment-974875070 @Limess a few questions - for this dataset do you want to run bulkinsert or upsert? if it's append only dataset, then bulkinsert should be the mode - does smaller parquet file

[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4013: URL: https://github.com/apache/hudi/pull/4013#issuecomment-972637957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] Limess edited a comment on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data af

2021-11-21 Thread GitBox
Limess edited a comment on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] alexeykudinkin commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-21 Thread GitBox
alexeykudinkin commented on pull request #4013: URL: https://github.com/apache/hudi/pull/4013#issuecomment-974745454 @xiarixiaoyao thanks for addressing the issues! After our testing we've also tried to squash some bugs in https://github.com/apache/hudi/pull/4026 and https://github

[GitHub] [hudi] hudi-bot removed a comment on pull request #4059: [HUDI-2813] Claim RFC number for RFC for spark datasource V2 Integration

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4059: URL: https://github.com/apache/hudi/pull/4059#issuecomment-974743082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] nsivabalan commented on issue #4058: Support for record level point lookup

2021-11-21 Thread GitBox
nsivabalan commented on issue #4058: URL: https://github.com/apache/hudi/issues/4058#issuecomment-974821671 it should work, but may not be performant. We are looking to add record level index and some work is in progress on this end. You can follow the work [here](https://github.com/apache

[GitHub] [hudi] mtami commented on issue #4008: [SUPPORT] Hudi failed to sync new partition table to glue data catalog

2021-11-21 Thread GitBox
mtami commented on issue #4008: URL: https://github.com/apache/hudi/issues/4008#issuecomment-974700415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [hudi] xushiyan merged pull request #4059: [HUDI-2813] Claim RFC number for RFC for spark datasource V2 Integration

2021-11-21 Thread GitBox
xushiyan merged pull request #4059: URL: https://github.com/apache/hudi/pull/4059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4060: [HUDI-2814] Addressing issues w/ Z-order Layout Optimization

2021-11-21 Thread GitBox
xiarixiaoyao commented on a change in pull request #4060: URL: https://github.com/apache/hudi/pull/4060#discussion_r753778621 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java ## @@ -284,55 +288,102 @@ public Boolean apply(String recordKey)

[GitHub] [hudi] hudi-bot removed a comment on pull request #4045: [HUDI-2472] Enabling metadata table for TestHoodieIndex

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #4045: URL: https://github.com/apache/hudi/pull/4045#issuecomment-974365981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] xushiyan commented on a change in pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
xushiyan commented on a change in pull request #3998: URL: https://github.com/apache/hudi/pull/3998#discussion_r753747739 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala ## @@ -0,0 +1,291 @@ +/* + * L

[GitHub] [hudi] leesf edited a comment on pull request #3967: [HUDI-2767] Enabling timeline server based marker as default

2021-11-21 Thread GitBox
leesf edited a comment on pull request #3967: URL: https://github.com/apache/hudi/pull/3967#issuecomment-974777352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [hudi] danny0405 merged pull request #4051: [HUDI-2804] Add option to skip compaction instants for streaming read

2021-11-21 Thread GitBox
danny0405 merged pull request #4051: URL: https://github.com/apache/hudi/pull/4051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubsc

[GitHub] [hudi] hudi-bot commented on pull request #4051: [HUDI-2804] Add option to skip compaction instants for streaming read

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4051: URL: https://github.com/apache/hudi/pull/4051#issuecomment-974595064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] xushiyan commented on issue #3841: Schema evolution improvement in 0.9.0 brakes existing applications

2021-11-21 Thread GitBox
xushiyan commented on issue #3841: URL: https://github.com/apache/hudi/issues/3841#issuecomment-974749469 @umehrot2 have you filed the jira? I created this https://issues.apache.org/jira/browse/HUDI-2811 to track Spark/Parquet upgrade related issues and tasks. cc @nsivabalan --

[GitHub] [hudi] xushiyan commented on issue #4017: [SUPPORT] ETL failure , Caused by: java.io.FileNotFoundException: No such file or directory

2021-11-21 Thread GitBox
xushiyan commented on issue #4017: URL: https://github.com/apache/hudi/issues/4017#issuecomment-974702484 @veenaypatil which spark 2.x version you used exactly? Hudi supports 2.4+ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] hudi-bot commented on pull request #4054: [MINOR] Optimize imports and delete useless or duplicate imports

2021-11-21 Thread GitBox
hudi-bot commented on pull request #4054: URL: https://github.com/apache/hudi/pull/4054#issuecomment-974617278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [hudi] xushiyan commented on issue #3933: [SUPPORT] Large amount of disk spill on initial upsert/bulk insert

2021-11-21 Thread GitBox
xushiyan commented on issue #3933: URL: https://github.com/apache/hudi/issues/3933#issuecomment-974875070 @Limess a few questions - for this dataset do you want to run bulkinsert or upsert? if it's append only dataset, then bulkinsert should be the mode - does smaller parquet file

[GitHub] [hudi] nsivabalan commented on issue #3854: [SUPPORT] Lower performance using 0.9.0 vs 0.8.0

2021-11-21 Thread GitBox
nsivabalan commented on issue #3854: URL: https://github.com/apache/hudi/issues/3854#issuecomment-974870498 Hey hi. Can you give it a try with open source across two versions. @umehrot2 : Can you chime in wrt EMR spark versions. Is there any performance patches expected for hudi 0.9.0 a

[GitHub] [hudi] nikita-sheremet-clearscale commented on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
nikita-sheremet-clearscale commented on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974868366 @xushiyan I have rn `hudi-cli.sh show rollbacks` the outout is: ``` hudi:truedata_detections->show rollbacks 21/11/21 18:13:53 INFO timeline.HoodieActi

[GitHub] [hudi] vinothchandar commented on pull request #4061: [MINOR] Fix RocketMQ logo in landing page

2021-11-21 Thread GitBox
vinothchandar commented on pull request #4061: URL: https://github.com/apache/hudi/pull/4061#issuecomment-974858404 lgtm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] Limess edited a comment on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data af

2021-11-21 Thread GitBox
Limess edited a comment on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 `_hoodie_is_deleted` is the added column, sorry I misquoted it above. I believe this should have been added to the end of the schema, although I'm not sure how the ordering

[GitHub] [hudi] Limess edited a comment on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data af

2021-11-21 Thread GitBox
Limess edited a comment on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 `_hoodie_is_deleted` is the added column, sorry I misquoted it above. I believe this should have been added to the end of the schema, although I'm not sure how the ordering

[GitHub] [hudi] nikita-sheremet-clearscale edited a comment on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
nikita-sheremet-clearscale edited a comment on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849011 @xushiyan Many thanks for the quick reply!!! Hudi config is: ``` hoodie.datasource.hive_sync.database -> hudi hoodie.datasource.write.row.wri

[GitHub] [hudi] nikita-sheremet-clearscale commented on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
nikita-sheremet-clearscale commented on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849399 Btw is there documentation how hudi handles interrupted data? Like there was EMR job that failed/stopped in the middle. Then data created by this job were deleted man

[GitHub] [hudi] Limess edited a comment on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data af

2021-11-21 Thread GitBox
Limess edited a comment on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 `_hoodie_is_deleted` is the added column, sorry I misquoted it above. I believe this should have been added to the end of the schema, although I'm not sure how the ordering

[GitHub] [hudi] nikita-sheremet-clearscale commented on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

2021-11-21 Thread GitBox
nikita-sheremet-clearscale commented on issue #4044: URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849011 Many thanks for the quick reply!!! Hudi config is: ``` hoodie.datasource.hive_sync.database -> hudi hoodie.datasource.write.row.writer.enable -> false

[GitHub] [hudi] Limess commented on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data after col

2021-11-21 Thread GitBox
Limess commented on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916 `_hoodie_is_deleted` is the added column, sorry I misquoted it above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] vinothchandar commented on a change in pull request #4046: [HUDI-2527] Multi writer test with conflicting async table services

2021-11-21 Thread GitBox
vinothchandar commented on a change in pull request #4046: URL: https://github.com/apache/hudi/pull/4046#discussion_r753816359 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestHoodieClientMultiWriter.java ## @@ -228,78 +231,88 @@ private voi

[GitHub] [hudi] yapnel commented on issue #4058: Support for record level point lookup

2021-11-21 Thread GitBox
yapnel commented on issue #4058: URL: https://github.com/apache/hudi/issues/4058#issuecomment-974826710 > it should work, but may not be performant. We are looking to add record level index and some work is in progress on this end. You can follow the work [here](https://github.com/apache/h

[GitHub] [hudi] nsivabalan commented on issue #4043: [SUPPORT] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.sql.Row error when writing particular source data after

2021-11-21 Thread GitBox
nsivabalan commented on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974824144 May I know whats the new column you are adding just in writer2? In desc you are describing as `_hoodie_deleted_date`, but I don't see any such field in your target table schema. may

[GitHub] [hudi] nsivabalan commented on a change in pull request #4046: [HUDI-2527] Multi writer test with conflicting async table services

2021-11-21 Thread GitBox
nsivabalan commented on a change in pull request #4046: URL: https://github.com/apache/hudi/pull/4046#discussion_r753804804 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestHoodieClientMultiWriter.java ## @@ -228,78 +231,88 @@ private void

[GitHub] [hudi] nsivabalan commented on issue #4058: Support for record level point lookup

2021-11-21 Thread GitBox
nsivabalan commented on issue #4058: URL: https://github.com/apache/hudi/issues/4058#issuecomment-974821671 it should work, but may not be performant. We are looking to add record level index and some work is in progress on this end. You can follow the work [here](https://github.com/apache

[GitHub] [hudi] nsivabalan commented on issue #4031: [SUPPORT] _hoodie_is_deleted should work with any truthy value

2021-11-21 Thread GitBox
nsivabalan commented on issue #4031: URL: https://github.com/apache/hudi/issues/4031#issuecomment-974820517 Let me know if I understand your question correctly. - You are seeing a behavior where when "_hoodie_is_deleted" is set to null or false, hudi persist this column on storage. And

[GitHub] [hudi] nsivabalan commented on issue #3394: [SUPPORT] Question on hudi's default behaviour for UPSERT

2021-11-21 Thread GitBox
nsivabalan commented on issue #3394: URL: https://github.com/apache/hudi/issues/3394#issuecomment-974819625 can you try setting `hoodie.datasource.write.precombine.field`. It should get applied to `hoodie.payload.ordering.field`. -- This is an automated message from the Apache Git Servi

[GitHub] [hudi] hudi-bot commented on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot commented on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974809647 ## CI report: * 256c4c8c909ae78b6c9fbfa9e58e008f5906de8c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?bu

[GitHub] [hudi] hudi-bot removed a comment on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974800507 ## CI report: * b451b3b4544ef112a8573d1040dfcc23e19a610d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/re

[GitHub] [hudi] hudi-bot commented on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot commented on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974800507 ## CI report: * b451b3b4544ef112a8573d1040dfcc23e19a610d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?bu

[GitHub] [hudi] hudi-bot removed a comment on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot removed a comment on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974799691 ## CI report: * b451b3b4544ef112a8573d1040dfcc23e19a610d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/re

[GitHub] [hudi] hudi-bot commented on pull request #3998: [HUDI-2759] extract HoodieCatalogTable as a bridge between spark cata…

2021-11-21 Thread GitBox
hudi-bot commented on pull request #3998: URL: https://github.com/apache/hudi/pull/3998#issuecomment-974799691 ## CI report: * b451b3b4544ef112a8573d1040dfcc23e19a610d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?bu

<    1   2   3   >