[GitHub] [hudi] zherenyu831 commented on issue #2083: Kafka readStream performance slow [SUPPORT]

2020-09-13 Thread GitBox
zherenyu831 commented on issue #2083: URL: https://github.com/apache/hudi/issues/2083#issuecomment-691855992 Hi rafaelhbarros We are running similar solution as you, just a suggestion, isn't the parallelism too small for you? ``` hoodie.insert.shuffle.parallelism=10 hoodie.up

[jira] [Assigned] (HUDI-1280) Add tool to capture earliest or latest offsets in kafka topics

2020-09-13 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang reassigned HUDI-1280: - Assignee: Trevorzhang > Add tool to capture earliest or latest offsets in kafka topics > ---

[GitHub] [hudi] zherenyu831 commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

2020-09-13 Thread GitBox
zherenyu831 commented on issue #2020: URL: https://github.com/apache/hudi/issues/2020#issuecomment-691815280 @bvaradar Thank you so much, will keeping using hoodie.filesystem.view.incr.timeline.sync.enable=false This is

[GitHub] [hudi] bvaradar commented on issue #2067: [SUPPORT][0.5.0-incubating] : HoodieUpsertException : Error upserting bucketType Update for partition :0

2020-09-13 Thread GitBox
bvaradar commented on issue #2067: URL: https://github.com/apache/hudi/issues/2067#issuecomment-691780797 Closing this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] bvaradar closed issue #2067: [SUPPORT][0.5.0-incubating] : HoodieUpsertException : Error upserting bucketType Update for partition :0

2020-09-13 Thread GitBox
bvaradar closed issue #2067: URL: https://github.com/apache/hudi/issues/2067 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] cadl commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2020-09-13 Thread GitBox
cadl commented on issue #2063: URL: https://github.com/apache/hudi/issues/2063#issuecomment-691779849 > @cadl : Did setting the config help ? @bvaradar sorry, I'm a little busy these days. I'll check it this week, the setting looks very helpful. Thanks

[GitHub] [hudi] bvaradar commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

2020-09-13 Thread GitBox
bvaradar commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-691779127 @modi95 : Can you look at this when you get a chance. This is an automated message from the Apache Git Service. To res

[GitHub] [hudi] bvaradar commented on issue #2065: [SUPPORT] Intermittent IllegalArgumentException while saving to Hudi dataset from Spark streaming job

2020-09-13 Thread GitBox
bvaradar commented on issue #2065: URL: https://github.com/apache/hudi/issues/2065#issuecomment-691778939 @prashanthvg89 : Please reopen if you still run into problems. This is an automated message from the Apache Git Service

[GitHub] [hudi] bvaradar closed issue #2065: [SUPPORT] Intermittent IllegalArgumentException while saving to Hudi dataset from Spark streaming job

2020-09-13 Thread GitBox
bvaradar closed issue #2065: URL: https://github.com/apache/hudi/issues/2065 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2020-09-13 Thread GitBox
bvaradar commented on issue #2063: URL: https://github.com/apache/hudi/issues/2063#issuecomment-691778778 @cadl : Did setting the config help ? This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [hudi] bvaradar commented on issue #2057: [SUPPORT] AWSDmsAvroPayload not processing Deletes correctly + IOException when reading log file

2020-09-13 Thread GitBox
bvaradar commented on issue #2057: URL: https://github.com/apache/hudi/issues/2057#issuecomment-691778577 @umehrot2 : Assigning this to you as this is specific to EMR. This is an automated message from the Apache Git Service.

[jira] [Updated] (HUDI-1270) NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5

2020-09-13 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1270: - Status: Open (was: New) > NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5 > -

[jira] [Commented] (HUDI-1270) NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5

2020-09-13 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195158#comment-17195158 ] Balaji Varadarajan commented on HUDI-1270: -- [~uditme] : Pinging  > NoSuchMethod

[GitHub] [hudi] bvaradar commented on issue #2051: [SUPPORT] insert operation didn't insert a new record, instead it updated existing records in my no-primary-key table

2020-09-13 Thread GitBox
bvaradar commented on issue #2051: URL: https://github.com/apache/hudi/issues/2051#issuecomment-69149 Closing this as we have a jira. This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [hudi] bvaradar closed issue #2051: [SUPPORT] insert operation didn't insert a new record, instead it updated existing records in my no-primary-key table

2020-09-13 Thread GitBox
bvaradar closed issue #2051: URL: https://github.com/apache/hudi/issues/2051 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1985: [SUPPORT]Error while running deltastreamer on top of backfilled data using Hudi

2020-09-13 Thread GitBox
bvaradar commented on issue #1985: URL: https://github.com/apache/hudi/issues/1985#issuecomment-691777089 Added https://issues.apache.org/jira/browse/HUDI-1280 for smooth transition to deltastreamer. This is an automated

[GitHub] [hudi] bvaradar closed issue #1985: [SUPPORT]Error while running deltastreamer on top of backfilled data using Hudi

2020-09-13 Thread GitBox
bvaradar closed issue #1985: URL: https://github.com/apache/hudi/issues/1985 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[jira] [Created] (HUDI-1280) Add tool to capture earliest or latest offsets in kafka topics

2020-09-13 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1280: Summary: Add tool to capture earliest or latest offsets in kafka topics Key: HUDI-1280 URL: https://issues.apache.org/jira/browse/HUDI-1280 Project: Apache H

[jira] [Updated] (HUDI-1280) Add tool to capture earliest or latest offsets in kafka topics

2020-09-13 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1280: - Status: Open (was: New) > Add tool to capture earliest or latest offsets in kafka topics

[GitHub] [hudi] bvaradar commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2020-09-13 Thread GitBox
bvaradar commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-691774166 @Ac-Rush : Are you still blocked by this ? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar commented on issue #1972: Deltasteamer with Transformation has schema issue

2020-09-13 Thread GitBox
bvaradar commented on issue #1972: URL: https://github.com/apache/hudi/issues/1972#issuecomment-691773018 Closing this as dupe This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] bvaradar closed issue #1972: Deltasteamer with Transformation has schema issue

2020-09-13 Thread GitBox
bvaradar closed issue #1972: URL: https://github.com/apache/hudi/issues/1972 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-09-13 Thread GitBox
bvaradar commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-691772192 Closing this due to inactivity This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [hudi] bvaradar closed issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-09-13 Thread GitBox
bvaradar closed issue #1962: URL: https://github.com/apache/hudi/issues/1962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar closed issue #1956: [SUPPORT] DMS for table without PK

2020-09-13 Thread GitBox
bvaradar closed issue #1956: URL: https://github.com/apache/hudi/issues/1956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1955: [SUPPORT] DMS partition treated as part of pk

2020-09-13 Thread GitBox
bvaradar commented on issue #1955: URL: https://github.com/apache/hudi/issues/1955#issuecomment-691771582 Added Jira for doc update : https://issues.apache.org/jira/browse/HUDI-1279 This is an automated message from the Apach

[jira] [Updated] (HUDI-1279) Update Apache Hudi website docs to clarify the property of record_keys

2020-09-13 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1279: - Status: Open (was: New) > Update Apache Hudi website docs to clarify the property of reco

[GitHub] [hudi] bvaradar closed issue #1955: [SUPPORT] DMS partition treated as part of pk

2020-09-13 Thread GitBox
bvaradar closed issue #1955: URL: https://github.com/apache/hudi/issues/1955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[jira] [Created] (HUDI-1279) Update Apache Hudi website docs to clarify the property of record_keys

2020-09-13 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1279: Summary: Update Apache Hudi website docs to clarify the property of record_keys Key: HUDI-1279 URL: https://issues.apache.org/jira/browse/HUDI-1279 Project: A

[GitHub] [hudi] yanghua commented on a change in pull request #2079: [HUDI-995] Use HoodieTestTable in more classes

2020-09-13 Thread GitBox
yanghua commented on a change in pull request #2079: URL: https://github.com/apache/hudi/pull/2079#discussion_r487611421 ## File path: hudi-client/src/test/java/org/apache/hudi/table/TestCleaner.java ## @@ -1058,10 +1069,21 @@ public void testCleanPreviousCorruptedCleanFiles()

[GitHub] [hudi] bvaradar commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

2020-09-13 Thread GitBox
bvaradar commented on issue #1835: URL: https://github.com/apache/hudi/issues/1835#issuecomment-691745661 @rajgowtham24 : Closing this issue. Please reopen if you are still having issues. This is an automated message from th

[GitHub] [hudi] bvaradar closed issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

2020-09-13 Thread GitBox
bvaradar closed issue #1835: URL: https://github.com/apache/hudi/issues/1835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar closed issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-09-13 Thread GitBox
bvaradar closed issue #1954: URL: https://github.com/apache/hudi/issues/1954 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-09-13 Thread GitBox
bvaradar commented on issue #1954: URL: https://github.com/apache/hudi/issues/1954#issuecomment-691745743 Closing this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] vinothchandar commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-13 Thread GitBox
vinothchandar commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-691745384 I can help address the remaining feedback. I will push a small diff today/tmrw. Overall, looks like a reasonable start. The major feedback I still have is the follo

[GitHub] [hudi] bvaradar closed issue #1943: [SUPPORT] Gradle fails with dependency on org.apache.hudi:hudi-spark_2.12:0.5.3

2020-09-13 Thread GitBox
bvaradar closed issue #1943: URL: https://github.com/apache/hudi/issues/1943 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1943: [SUPPORT] Gradle fails with dependency on org.apache.hudi:hudi-spark_2.12:0.5.3

2020-09-13 Thread GitBox
bvaradar commented on issue #1943: URL: https://github.com/apache/hudi/issues/1943#issuecomment-691740550 This is a valid issue and is currently assigned for next release with a tracking Jira. Closing the Github issue as we will track this in Jira.

[jira] [Updated] (HUDI-1202) Fix Gradle dependency issue when pulling in hudi-spark_2.12

2020-09-13 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1202: - Status: Open (was: New) > Fix Gradle dependency issue when pulling in hudi-spark_2.12 >

[GitHub] [hudi] bvaradar closed issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-09-13 Thread GitBox
bvaradar closed issue #1939: URL: https://github.com/apache/hudi/issues/1939 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-09-13 Thread GitBox
bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-691739964 @RajasekarSribalan : Please reopen if you still have any questions. Thanks, Balaji.V This is an automated mes

[GitHub] [hudi] bvaradar closed issue #1936: Hudi Query Error

2020-09-13 Thread GitBox
bvaradar closed issue #1936: URL: https://github.com/apache/hudi/issues/1936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1936: Hudi Query Error

2020-09-13 Thread GitBox
bvaradar commented on issue #1936: URL: https://github.com/apache/hudi/issues/1936#issuecomment-691739803 @umehrot2 : when you get a chance please let @harishchanderramesh of the EMR release date. Closing this ticket. This

[GitHub] [hudi] bvaradar closed issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-09-13 Thread GitBox
bvaradar closed issue #1925: URL: https://github.com/apache/hudi/issues/1925 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-09-13 Thread GitBox
bvaradar commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-691739504 There is a WIP PR (https://github.com/apache/hudi/pull/2069) which addresses early cleaning of local files that are opened. This is the only close thing that I can suspect anything rel

[GitHub] [hudi] bvaradar closed issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-09-13 Thread GitBox
bvaradar closed issue #1913: URL: https://github.com/apache/hudi/issues/1913 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-13 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r484832824 ## File path: hudi-client/pom.xml ## @@ -68,6 +107,12 @@ + + + org.scala-lang Review comment: > should we limit scala to j

[GitHub] [hudi] vinothchandar commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-13 Thread GitBox
vinothchandar commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r487575888 ## File path: hudi-client/pom.xml ## @@ -68,6 +107,12 @@ + + + org.scala-lang Review comment: fair. let me take a closer

[GitHub] [hudi] wangxianghu edited a comment on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-13 Thread GitBox
wangxianghu edited a comment on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-688918271 > One more pass. > > @wangxianghu do the tests pass locally? 50 min is the travis limit, if its consistently exceeding that limit, we need to understand why and fi

[GitHub] [hudi] vinothchandar commented on pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-13 Thread GitBox
vinothchandar commented on pull request #2048: URL: https://github.com/apache/hudi/pull/2048#issuecomment-691710534 @satishkotha can you please resolve all the comments, that you have addressed already. That way we can track whats pending --

[GitHub] [hudi] vinothchandar commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-13 Thread GitBox
vinothchandar commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r487541865 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java ## @@ -95,6 +93,13 @@ public HoodieWriteMeta

[jira] [Updated] (HUDI-465) Make Hive Sync via Spark painless

2020-09-13 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei updated HUDI-465: --- Status: In Progress (was: Open) > Make Hive Sync via Spark painless > - > >

[GitHub] [hudi] shenh062326 opened a new pull request #2088: [HUDI-1208] Ordering Field should be optional when precombine is turned off

2020-09-13 Thread GitBox
shenh062326 opened a new pull request #2088: URL: https://github.com/apache/hudi/pull/2088 ## What is the purpose of the pull request * Spark Data Source Write and DeletaStreamer should allow ordering field to be optional when precombine is turned off. ## Brief change log

[jira] [Updated] (HUDI-1208) Ordering Field should be optional when precombine is turned off

2020-09-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1208: - Labels: pull-request-available (was: ) > Ordering Field should be optional when precombine is tur