Re: [I] [SUPPORT] Hudi offline compaction ignores old data [hudi]

2024-03-13 Thread via GitHub
nrlm1 commented on issue #10863: URL: https://github.com/apache/hudi/issues/10863#issuecomment-1996220054 Thank you for looking into this issue. We are not specifying any. Expectation is to use the default "upsert". -- This is an automated message from the Apache Git Service. To respond

Re: [I] [SUPPORT] Hudi offline compaction ignores old data [hudi]

2024-03-13 Thread via GitHub
danny0405 commented on issue #10863: URL: https://github.com/apache/hudi/issues/10863#issuecomment-1996190248 Did you use `upsert` as the operation name or just `insert` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[I] [SUPPORT] Hudi offline compaction ignores old data [hudi]

2024-03-13 Thread via GitHub
ennox108 opened a new issue, #10863: URL: https://github.com/apache/hudi/issues/10863 I am trying to run a Flink job to get data from SQL server to S3. I am doing offline compaction but whenever it is triggered I end up having less records than before the compaction. Based on the com

Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-07 Thread via GitHub
joshhamann commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1983870364 You can see the timestamps in the above screenshots from the Spark UI if that works. For instance, the test job, which is processing more data, goes from around 23:18 to 23:23 (an

Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-07 Thread via GitHub
ad1happy2go commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1983745782 @joshhamann That's the correct understanding. If we are not using global bloom, then if your incremental dataset only had data from very few partitions , then index lookup stage w

Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-06 Thread via GitHub
joshhamann commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1981391484 Here is my configuration: {'hoodie.table.name': 'analytics_events', 'hoodie.datasource.write.recordkey.field': 'event_uuid', 'hoodie.datasource.write.partitionpath.field':

Re: [I] [SUPPORT] Hudi 0.14.0 - deletion from table failing for org.apache.hudi.keygen.TimestampBasedKeyGenerator [hudi]

2024-03-06 Thread via GitHub
ad1happy2go commented on issue #10823: URL: https://github.com/apache/hudi/issues/10823#issuecomment-1980552281 @ShrutiBansal309 Able to reproduce this issue. Issue comes even when we just try to read this table. JIRA - https://issues.apache.org/jira/browse/HUDI-7485 Reproducible C

Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-05 Thread via GitHub
ad1happy2go commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1980135128 @joshhamann Can you please provide the writer configuration to look into this more. If you are using upsert operation type, The load to a new Hudi Table will be expected to

[I] [SUPPORT] Hudi 0.14.0 - deletion from table failing for org.apache.hudi.keygen.TimestampBasedKeyGenerator [hudi]

2024-03-05 Thread via GitHub
ShrutiBansal309 opened a new issue, #10823: URL: https://github.com/apache/hudi/issues/10823 **Issue** I am using Hudi 0.14.0 and Spark 3.4.0 on EMR cluster 6.15.0. I have a service that writes a Dataset to a table in Hudi located on S3. I am facing issues when trying to delete data fr

[I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-05 Thread via GitHub
joshhamann opened a new issue, #10822: URL: https://github.com/apache/hudi/issues/10822 **Describe the problem you faced** We have a production transform job using AWS Glue version 4.0, Hudi version 0.12.1 that loads data into a hudi table on s3. At some point, this job starting taking

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-03-04 Thread via GitHub
CTTY commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1977660648 This looks similar to this issue: https://github.com/apache/hudi/issues/7487 where user ran into S3 throttling issue due to too many S3 calls. Was wondering if you can check if the

Re: [I] [SUPPORT] Hudi table has duplicate data. [hudi]

2024-02-29 Thread via GitHub
chenbodeng719 commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1970657108 > @chenbodeng719 Can you please create a new issue with all the details about hudi/spark versions and steps to reproduce. Thanks. ok -- This is an automated message from

Re: [I] [SUPPORT] Hudi table has duplicate data. [hudi]

2024-02-29 Thread via GitHub
ad1happy2go commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1970644646 @chenbodeng719 Can you please create a new issue with all the details about hudi/spark versions and steps to reproduce. Thanks. -- This is an automated message from the Apache Git

Re: [I] [SUPPORT] Hudi table has duplicate data. [hudi]

2024-02-28 Thread via GitHub
chenbodeng719 commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1970368618 @nsivabalan I have the same issue. The below is my flink hudi config. ``` CREATE TABLE hudi_sink( new_uid STRING PRIMARY KEY NOT ENFORCED,

[I] [SUPPORT] [hudi]

2024-02-28 Thread via GitHub
Toroidals opened a new issue, #10779: URL: https://github.com/apache/hudi/issues/10779 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? yes - Join the mailing list to engage in conversations and get faster support at dev-

Re: [I] [SUPPORT] [hudi]

2024-02-25 Thread via GitHub
danny0405 commented on issue #10754: URL: https://github.com/apache/hudi/issues/10754#issuecomment-1963209068 you are right, we should enode the partition path for these special characters. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[I] [SUPPORT] [hudi]

2024-02-25 Thread via GitHub
eshu opened a new issue, #10754: URL: https://github.com/apache/hudi/issues/10754 When the partition column contains the slash character ("/"), Hudi could write the data incorrectly or do not read the back. Test (I use some helpers to write and read Hudi data, they write write data t

Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-02-24 Thread via GitHub
yihua commented on issue #10566: URL: https://github.com/apache/hudi/issues/10566#issuecomment-1962825044 test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-02-24 Thread via GitHub
yihua commented on issue #10566: URL: https://github.com/apache/hudi/issues/10566#issuecomment-1962806613 test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] [SUPPORT] Hudi wants to write the database in s3://datalake [hudi]

2024-02-18 Thread via GitHub
alberttwong closed issue #10695: [SUPPORT] Hudi wants to write the database in s3://datalake URL: https://github.com/apache/hudi/issues/10695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-01-31 Thread via GitHub
ad1happy2go commented on issue #10566: URL: https://github.com/apache/hudi/issues/10566#issuecomment-1919591261 @CTTY I was trying to reproduce this issue, but got into some other setup issue. Will get back to you soon on this. -- This is an automated message from the Apache Git Service.

Re: [I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2024-01-31 Thread via GitHub
ad1happy2go commented on issue #10112: URL: https://github.com/apache/hudi/issues/10112#issuecomment-1919292489 @zyclove Did you got a chance to try this? Was this PR fixed your issue. Please share the insights here. Thanks in advance. -- This is an automated message from the Apache Git S

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-31 Thread via GitHub
ad1happy2go commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1919021038 Thanks for trying @ergophobiac. @CTTY any insights here ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-31 Thread via GitHub
ad1happy2go commented on issue #10458: URL: https://github.com/apache/hudi/issues/10458#issuecomment-1918953387 I will work on updating the docs. Thanks @stayrascal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-25 Thread via GitHub
stayrascal commented on issue #10458: URL: https://github.com/apache/hudi/issues/10458#issuecomment-1911319106 The official document should change the value of 'cdc.enabled' to ’false‘, or change the value of 'table.type' to 'COPY_ON_WRITE', because only COW table support cdc mode for Flin

[I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-01-25 Thread via GitHub
CTTY opened a new issue, #10566: URL: https://github.com/apache/hudi/issues/10566 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-24 Thread via GitHub
soumilshah1995 closed issue #10499: [SUPPORT] Hudi DeltaStreamer with Flattening Transformer URL: https://github.com/apache/hudi/issues/10499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-24 Thread via GitHub
soumilshah1995 commented on issue #10499: URL: https://github.com/apache/hudi/issues/10499#issuecomment-1909122022 I would need some time to play with flattening transformer need to setup a test project to see if works let me close this and reopen it later again as I would be doing th

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-23 Thread via GitHub
ad1happy2go commented on issue #10499: URL: https://github.com/apache/hudi/issues/10499#issuecomment-1907473634 @soumilshah1995 Let us know your findings and in case you need any help. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-22 Thread via GitHub
ad1happy2go commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1904404899 @zeeshan-media If I understand you clearly, with column stats you got properly size files but with RLI you getting small files. Can you message me on slack when you see this, we c

[I] [SUPPORT] [hudi]

2024-01-21 Thread via GitHub
LIKE-HUB opened a new issue, #10543: URL: https://github.com/apache/hudi/issues/10543 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-18 Thread via GitHub
zeeshan-media commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1897991450 yes, It was 7.2 MB's each. I was using COW mode, I had not used faker data for that purpose, it was authentic data having 3 million records, the configurations were same, only d

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-17 Thread via GitHub
soumilshah1995 commented on issue #10499: URL: https://github.com/apache/hudi/issues/10499#issuecomment-1897068762 let me get back to this issue after some more tries want to try out few things -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-17 Thread via GitHub
ad1happy2go commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1896142856 @zeeshan-media Just to be sure, data files are 7.2 MB each? Number of record keys will affect the record_index size not the data size. Ideally small files should merge and c

Re: [I] [SUPPORT] Hudi Delete Partition on AWS Glue [hudi]

2024-01-16 Thread via GitHub
soumilshah1995 commented on issue #8894: URL: https://github.com/apache/hudi/issues/8894#issuecomment-1894700142 hey buddy depends on how you have partitioned your tables if you have partitioned tables with hive style state='Connecticut. should work lets connect on slack for

Re: [I] [SUPPORT] Hudi Delete Partition on AWS Glue [hudi]

2024-01-16 Thread via GitHub
007vedant commented on issue #8894: URL: https://github.com/apache/hudi/issues/8894#issuecomment-1894364961 Hi @soumilshah1995 I've a use case of deleting specific partitions from my Hudi table. I verified that the deletion works when I just specify the list of partitions to be deleted

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
zeeshan-media commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1894111848 @ad1happy2go each file is of 7 Mb's each. I used Amazon EMR pyspark version 3.4.1. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
ad1happy2go commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1893959075 @zeeshan-media In the first write, What is the size of those 30 files? The number of files should not depend on RLI anyway. Ideally small file handling should take place in upser

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
zeeshan-media commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1893874309 @ad1happy2go by parquet table I meant the hudi output directory as it is in parquet format. My output hudi directory is of 210 mb's of data which contains 30 small files. Record

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
ad1happy2go commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1893862799 @zeeshan-media Hudi upsert path involves tagging of existing data to find out which records are updated, which will not use RECORD_INDEX (doesn't matter as there is no existing da

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
zeeshan-media commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1893778705 @ad1happy2go does it mean that for the first time when we run the job, record index will not be used because it is creating 30 files(7 mb each) for approximately 210 mb's (3 mil

Re: [I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [h

2024-01-16 Thread via GitHub
ad1happy2go commented on issue #10507: URL: https://github.com/apache/hudi/issues/10507#issuecomment-1893742984 @zeeshan-media Thanks for raising this. I tried the code and realised that for the first time while writing data to a empty table, it gives this warning as record_index is not

[I] [SUPPORT] Hudi Record Index not working as Expected: gives warning as "WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records" [hudi]

2024-01-16 Thread via GitHub
zeeshan-media opened a new issue, #10507: URL: https://github.com/apache/hudi/issues/10507 ### Problem Detail: I am trying hudi record index on my machine, although my pyspark job runs smoothly and data is written along with creation of record_index file in the hudi's metadata table, it

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-15 Thread via GitHub
soumilshah1995 commented on issue #10499: URL: https://github.com/apache/hudi/issues/10499#issuecomment-1892837784 Following works ``` spark-submit \ --class org.apache.hudi.utilities.streamer.HoodieStreamer \ --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.

Re: [I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-15 Thread via GitHub
soumilshah1995 commented on issue #10499: URL: https://github.com/apache/hudi/issues/10499#issuecomment-1892807362 Also tried ``` spark-submit \ --class org.apache.hudi.utilities.streamer.HoodieStreamer \ --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0

[I] [SUPPORT] Hudi DeltaStreamer with Flattening Transformer [hudi]

2024-01-15 Thread via GitHub
soumilshah1995 opened a new issue, #10499: URL: https://github.com/apache/hudi/issues/10499 Hello, I'm currently experimenting with the Hudi delta streamer and working on creating part 12 of the delta streamer playlist. For the next video, my goal is to cover the Hudi SQL-based transformer

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-10 Thread via GitHub
ergophobiac commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1886257556 Hello @ad1happy2go , We ran a test with the same configurations, just one addition: spark.hadoop.fs.s3a.connection.maximum=2000. (We found a resource saying the default on EMR

Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-10 Thread via GitHub
ad1happy2go commented on issue #10458: URL: https://github.com/apache/hudi/issues/10458#issuecomment-1884412880 Created a JIRA to track - https://issues.apache.org/jira/browse/HUDI-7287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-08 Thread via GitHub
ad1happy2go commented on issue #10458: URL: https://github.com/apache/hudi/issues/10458#issuecomment-1880868263 @nicholasxu Thanks for raising this. I am also getting this error while querying with 'read.streaming.enabled' and 'cdc.enabled' is true . Normal reads are running fine. We will

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-02 Thread via GitHub
ergophobiac commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1874167209 Hey @ad1happy2go, we have a test case running, we'll observe till we're sure it's stable and let you know how it turns out. -- This is an automated message from the Apache Git S

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-02 Thread via GitHub
ad1happy2go commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1874092759 @ergophobiac Did you got a chance to try this out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2023-12-27 Thread via GitHub
ad1happy2go commented on issue #10415: URL: https://github.com/apache/hudi/issues/10415#issuecomment-1870277355 @ergophobiac Are you setting fs.s3a.connection.maximum to a higher value? Can you try increasing its value and try? -- This is an automated message from the Apache Git Service.

[I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2023-12-26 Thread via GitHub
ergophobiac opened a new issue, #10415: URL: https://github.com/apache/hudi/issues/10415 **Describe the problem you faced** Stack: Hudi 0.13.1, EMR 6.13.0, Spark 3.4.1 We are writing to an MOR table in S3, using Spark Structured Streaming job on EMR. Once this job has run for a

Re: [I] [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 [hudi]

2023-12-22 Thread via GitHub
parisni closed issue #10402: [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 URL: https://github.com/apache/hudi/issues/10402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 [hudi]

2023-12-22 Thread via GitHub
parisni commented on issue #10402: URL: https://github.com/apache/hudi/issues/10402#issuecomment-1867821807 > It worked fine for me. good to know, sorry for inconvenience > Can you confirm if scala version is same for your spark installation and hudi is same. Yes it's sc

Re: [I] [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 [hudi]

2023-12-22 Thread via GitHub
ad1happy2go commented on issue #10402: URL: https://github.com/apache/hudi/issues/10402#issuecomment-1867801673 @parisni It worked fine for me. Can you confirm if scala version is same for your spark installation and hudi is same. https://github.com/apache/hudi/assets/63430370/93cea61

Re: [I] [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 [hudi]

2023-12-22 Thread via GitHub
parisni commented on issue #10402: URL: https://github.com/apache/hudi/issues/10402#issuecomment-1867662880 @nsivabalan as the release manager for 0.14.1 maybe ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] [SUPPORT] Hudi 0.14.1-rc1 has trouble with spark 3.2 [hudi]

2023-12-22 Thread via GitHub
parisni opened a new issue, #10402: URL: https://github.com/apache/hudi/issues/10402 ```python # spark-3.2.4-bin-hadoop3.2/bin/pyspark --jars /projects/hudi/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.14.1-rc1.jar --conf 'spark.serializer=org.apache.spark.serializer.Kr

[I] [SUPPORT] [hudi]

2023-12-21 Thread via GitHub
LIKE-HUB opened a new issue, #10400: URL: https://github.com/apache/hudi/issues/10400 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? yes - Join the mailing list to engage in conversations and get faster support at dev-su

Re: [I] [SUPPORT] hudi 0.14.1-RC1 task be stuck sometime. [hudi]

2023-12-19 Thread via GitHub
zyclove commented on issue #10359: URL: https://github.com/apache/hudi/issues/10359#issuecomment-1863737221 > @zyclove Have you started seeing this issue after hudi upgrade only? Not sure if it's related to hudi. Yes, The hudi 0.12.3 works ok, but upgrade to 0.14.x with this issue.

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-12-19 Thread via GitHub
ad1happy2go commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1863070904 Sorry for the delay here. @abhisheksahani91 Checkout the nice blog from @nsivabalan on timeline server https://medium.com/@simpsons/timeline-server-in-apache-hudi-b5be25f85e47 -

Re: [I] [SUPPORT] hudi 0.14.1-RC1 task be stuck sometime. [hudi]

2023-12-19 Thread via GitHub
ad1happy2go commented on issue #10359: URL: https://github.com/apache/hudi/issues/10359#issuecomment-1862289560 @zyclove Have you started seeing this issue after hudi upgrade only? Not sure if it's related to hudi. -- This is an automated message from the Apache Git Service. To respond t

Re: [I] [SUPPORT] hudi 0.14.1-RC1 task be stuck sometime. [hudi]

2023-12-18 Thread via GitHub
zyclove commented on issue #10359: URL: https://github.com/apache/hudi/issues/10359#issuecomment-1861986296 https://issues.apache.org/jira/browse/HDFS-8429 https://issues.apache.org/jira/browse/HADOOP-11333 It seems hdfs bug. Does the config net.core.wmem_default work? -- This is

Re: [I] [SUPPORT] hudi 0.14.1-RC1 task be stuck sometime. [hudi]

2023-12-18 Thread via GitHub
zyclove commented on issue #10359: URL: https://github.com/apache/hudi/issues/10359#issuecomment-1861977466 ``` Thread 11806: (state = IN_NATIVE_TRANS) - org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, org.apache.hadoop.net.unix.DomainSocketWatcher$FdSet) @bci=0 (Interpret

[I] [SUPPORT] hudi 0.14.1-RC1 task be stuck sometime. [hudi]

2023-12-18 Thread via GitHub
zyclove opened a new issue, #10359: URL: https://github.com/apache/hudi/issues/10359 After upgrading the community to version 0.14.1, there is still a probability that the task will be stuck. The yarn task has been completed and the task cannot be viewed through yarn app -list. It requires

[I] [SUPPORT] [hudi]

2023-12-18 Thread via GitHub
khajaasmath786 opened a new issue, #10356: URL: https://github.com/apache/hudi/issues/10356 Here is the intermittent we face in our current hudi data pipeline with apache spark. Traceback (most recent call last): File "/mnt/tmp/spark-d5dc3d59-8086-4598-b0f8-345b495e8dd1/baxaws-e

[I] [SUPPORT] [hudi]

2023-12-12 Thread via GitHub
young138120 opened a new issue, #10320: URL: https://github.com/apache/hudi/issues/10320 **Describe the problem you faced** I run spark job to write data to hudi, and init spark session like this: ![image](https://github.com/apache/hudi/assets/11519151/37f69790-5cbd-44b4-94be-f2613e71f

Re: [I] [SUPPORT] [hudi]

2023-12-11 Thread via GitHub
Amar1404 closed issue #10311: [SUPPORT] URL: https://github.com/apache/hudi/issues/10311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubsc

[I] [SUPPORT] [hudi]

2023-12-11 Thread via GitHub
Amar1404 opened a new issue, #10311: URL: https://github.com/apache/hudi/issues/10311 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

Re: [I] [SUPPORT] Hudi Bootstrap with METADATA_ONLY with Hive Sync Fails on EMR Serverlkess 6.10 [hudi]

2023-12-11 Thread via GitHub
soumilshah1995 closed issue #8565: [SUPPORT] Hudi Bootstrap with METADATA_ONLY with Hive Sync Fails on EMR Serverlkess 6.10 URL: https://github.com/apache/hudi/issues/8565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video [hudi]

2023-12-11 Thread via GitHub
soumilshah1995 closed issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video URL: https://github.com/apache/hudi/issues/8400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-10 Thread via GitHub
abhisheksahani91 commented on issue #10270: URL: https://github.com/apache/hudi/issues/10270#issuecomment-1849503205 @ad1happy2go I have scaled the infra and compaction execution time has been reduced from 20 minutes to 10. But I have one doubt, on every compaction, the number of

Re: [I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-08 Thread via GitHub
abhisheksahani91 commented on issue #10270: URL: https://github.com/apache/hudi/issues/10270#issuecomment-1847465601 @ad1happy2go The way we conducted the performance test for Hudi in our pre-production environment is as follows: 1. Bootstrapping the table: We ingested data over K

Re: [I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-07 Thread via GitHub
abhisheksahani91 commented on issue #10270: URL: https://github.com/apache/hudi/issues/10270#issuecomment-1845776273 @ad1happy2go 1. Are you setting any additional spark configurations: No, I am not setting any additional spark configuration. 2. Total size of the table

Re: [I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-07 Thread via GitHub
ad1happy2go commented on issue #10270: URL: https://github.com/apache/hudi/issues/10270#issuecomment-1845540298 @abhisheksahani91 Are you setting any additional spark configurations? How much data you have in the table? Can you check the compaction timeline file and see how many file groups

Re: [I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-07 Thread via GitHub
abhisheksahani91 commented on issue #10270: URL: https://github.com/apache/hudi/issues/10270#issuecomment-1844997033 Please help with this as this is impacting our production pipeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] [SUPPORT] HUDI GLUE Async compaction for MOR table is taking long time and it is also blocking the ingestion [hudi]

2023-12-07 Thread via GitHub
abhisheksahani91 opened a new issue, #10270: URL: https://github.com/apache/hudi/issues/10270 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at d

Re: [I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-04 Thread via GitHub
zyclove commented on issue #10235: URL: https://github.com/apache/hudi/issues/10235#issuecomment-1838182121 SparkMetadataTableRecordIndex fileGroupSize = hoodieTable.getMetadataTable().getNumFileGroupsForPartition(MetadataPartitionType.RECORD_INDEX); Why not 512 fileGroupSi

Re: [I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-04 Thread via GitHub
zyclove commented on issue #10235: URL: https://github.com/apache/hudi/issues/10235#issuecomment-1838138200 @danny0405 With set hoodie.metadata.enable=true, now is RECORD_INDEX. But the follow stage is very very slow too. ![image](https://github.com/apache/hudi/assets/15028279/fa2

Re: [I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-03 Thread via GitHub
danny0405 commented on issue #10235: URL: https://github.com/apache/hudi/issues/10235#issuecomment-1837959640 hoodie.metadata.table -> hoodie.metadata.enable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-03 Thread via GitHub
zyclove commented on issue #10235: URL: https://github.com/apache/hudi/issues/10235#issuecomment-1837949851 @danny0405 why is back to GLOBAL_SIMPLE? ![image](https://github.com/apache/hudi/assets/15028279/20107e0d-46eb-4e28-9a5a-0fc8750cbc34) 23/12/04 14:39:29 WARN SparkMetadataTa

Re: [I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-03 Thread via GitHub
zyclove commented on issue #10235: URL: https://github.com/apache/hudi/issues/10235#issuecomment-1837946117 @danny0405 why is back to GLOBAL_SIMPLE? https://github.com/apache/hudi/assets/15028279/9cddf011-e25c-4c0f-9b40-c2d7fdd17cf9";> 23/12/04 14:39:29 WARN SparkMetadataTableRe

[I] [SUPPORT] hudi RECORD_INDEX is too slow in "Building workload profile" stage . why is HoodieGlobalSimpleIndex ? [hudi]

2023-12-03 Thread via GitHub
zyclove opened a new issue, #10235: URL: https://github.com/apache/hudi/issues/10235 **Describe the problem you faced** The spark job is too slow in follow stage. Adjusting CPU, memory, and concurrency has no effect. Which stage can be optimized or skipped? ![image](ht

[I] [SUPPORT] [hudi]

2023-11-28 Thread via GitHub
XenosK opened a new issue, #10204: URL: https://github.com/apache/hudi/issues/10204 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr.

Re: [I] [SUPPORT] hudi-examples-dbt not running with spark thrift server [hudi]

2023-11-27 Thread via GitHub
xushiyan closed issue #6125: [SUPPORT] hudi-examples-dbt not running with spark thrift server URL: https://github.com/apache/hudi/issues/6125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] [SUPPORT] hudi-examples-dbt not running with spark thrift server [hudi]

2023-11-27 Thread via GitHub
xushiyan commented on issue #6125: URL: https://github.com/apache/hudi/issues/6125#issuecomment-1828246693 closing as solution provided -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] [SUPPORT]hudi upsert data Caused by: org.apache.hadoop.fs.PathIsNotEmptyDirectoryException [hudi]

2023-11-26 Thread via GitHub
blackcheckren commented on issue #9029: URL: https://github.com/apache/hudi/issues/9029#issuecomment-1827003764 I also encountered this problem, but the corresponding directory on S3 could not be deleted after dozens of manual attempts, and the log showed that a folder with the same name wa

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-23 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1825219956 @ad1happy2go Testing is concluded and with recommended changes, I am not observing the connection issue and also not observing any performance issue. But can you pleas

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-23 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1824003259 @ad1happy2go Can you please explain if there can be a disadvantage if I disable the timeline server. -- This is an automated message from the Apache Git Service. To respond

Re: [I] [SUPPORT] hudi-examples-dbt not running with spark thrift server [hudi]

2023-11-21 Thread via GitHub
xushiyan commented on issue #6125: URL: https://github.com/apache/hudi/issues/6125#issuecomment-181787 @sambhav13 I'm updating the instructions in the dbt example (using spark 3.2 and hudi 0.14.0). Please check this out and let us know if it helps. https://github.com/apache/hudi/

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub
ad1happy2go commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1822164822 @abhisheksahani91 There looks like related to this which is yet to be fixed. https://github.com/apache/hudi/pull/5269 To unblock you can disable the timeline server for n

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1821565443 @ad1happy2go I also want to add the point the connection refused error is observed when I am generating the high load on hudi ingestion https://github.com/apache/hudi/a

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1821325376 @ad1happy2go Schema evolution is working now. I Did not change anything further and added the same properties you mentioned. The only issue is connection was r

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1820976149 @ad1happy2go Today also I tried from scratch. At first, I inserting the records and later I changed the schema to add new field and send the update with new column This t

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub
ad1happy2go commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1820968721 @abhisheksahani91 I somehow tried a lot to reproduce the issue in my local setup with 0.12.1 Hudi version but unable to reproduce. Can you try to reproduce once like below -

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-20 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1819095816 @ad1happy2go thanks for all the support. Actually I am blocked from taking Hudi live in production. Can you please help me with ETA for this? -- This is an automated messa

Re: [I] [SUPPORT]hudi insert is too slow [hudi]

2023-11-20 Thread via GitHub
zyclove commented on issue #10131: URL: https://github.com/apache/hudi/issues/10131#issuecomment-1818731721 @ad1happy2go Can bulk mode not generate small files? Directly output the 128M result file and merge it later. If hoodie.clustering is turned on, can small files be automatically

Re: [I] [SUPPORT]hudi insert is too slow [hudi]

2023-11-20 Thread via GitHub
ad1happy2go commented on issue #10131: URL: https://github.com/apache/hudi/issues/10131#issuecomment-1818465534 @zyclove Bulk_insert mode don't merge the small files while ingestion. So, you have to do clustering after bulk_insert to optimise file size. -- This is an automated message fro

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-19 Thread via GitHub
ad1happy2go commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1818393135 @abhisheksahani91 Thanks @abhisheksahani91. I will work on reproducing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-19 Thread via GitHub
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1818379062 @ad1happy2go I have added the config "--hoodie-conf", "hoodie.schema.on.read.enable=true" and schema is also { "name": "newCol", "type": [ "null",

<    1   2   3   4   >