[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

2021-05-25 Thread Vinay (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351542#comment-17351542 ] Vinay commented on HUDI-1910: - [~nishith29] I think first approach is good, like Hudi will con

[GitHub] [hudi] am-cpp edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
am-cpp edited a comment on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-848497460 @nsivabalan 1. Yes the configuration is https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY which is set to **true**. 2. Yes the incoming records

[GitHub] [hudi] am-cpp commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
am-cpp commented on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-848497460 @nsivabalan 1. Yes the configuration is https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY 2. Yes the incoming records in the dataframe have multiple rec

[GitHub] [hudi] danny0405 closed pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
danny0405 closed pull request #2899: URL: https://github.com/apache/hudi/pull/2899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-05-25 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351505#comment-17351505 ] Ethan Guo commented on HUDI-1138: - Here is my plan for improving the marker file mechanism

[GitHub] [hudi] wangxianghu commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
wangxianghu commented on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848482449 @yanghua please take a look when free -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] veenaypatil commented on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
veenaypatil commented on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848469043 Unit tests for hudi-spark-client failed but this change should not be the cause of it. @wangxianghu @vinothchandar can you pls merge -- This is an automated message from th

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848443240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848443240 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2996?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848443240 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2996?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848443240 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2996?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter commented on pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
codecov-commenter commented on pull request #2996: URL: https://github.com/apache/hudi/pull/2996#issuecomment-848443240 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2996?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache

[jira] [Created] (HUDI-1936) Introduce a optional property for conditional upsert

2021-05-25 Thread Biswajit mohapatra (Jira)
Biswajit mohapatra created HUDI-1936: Summary: Introduce a optional property for conditional upsert Key: HUDI-1936 URL: https://issues.apache.org/jira/browse/HUDI-1936 Project: Apache Hudi

[jira] [Updated] (HUDI-1935) Update Logger for FlatteningTransformer

2021-05-25 Thread Vinay (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-1935: Status: In Progress (was: Open) > Update Logger for FlatteningTransformer > --

[jira] [Updated] (HUDI-1935) Update Logger for FlatteningTransformer

2021-05-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1935: - Labels: pull-request-available (was: ) > Update Logger for FlatteningTransformer > -

[GitHub] [hudi] veenaypatil opened a new pull request #2996: [HUDI-1935] Update Logger statement of FlatteningTransformer

2021-05-25 Thread GitBox
veenaypatil opened a new pull request #2996: URL: https://github.com/apache/hudi/pull/2996 ## What is the purpose of the pull request This PR just updates the Logger statement as it was pointing to different class ## Brief change log Modify Logger statement of Flattenin

[jira] [Created] (HUDI-1935) Update Logger for FlatteningTransformer

2021-05-25 Thread Vinay (Jira)
Vinay created HUDI-1935: --- Summary: Update Logger for FlatteningTransformer Key: HUDI-1935 URL: https://issues.apache.org/jira/browse/HUDI-1935 Project: Apache Hudi Issue Type: Task Component

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2926: URL: https://github.com/apache/hudi/pull/2926#issuecomment-835303925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2926: URL: https://github.com/apache/hudi/pull/2926#issuecomment-835303925 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2926?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2925: [HUDI-1879] Fix RO Tables Returning Snapshot Result

2021-05-25 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2925: URL: https://github.com/apache/hudi/pull/2925#discussion_r639374959 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -194,11 +197,17 @@ private void syncSchema(String t

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2926: URL: https://github.com/apache/hudi/pull/2926#issuecomment-835303925 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2926?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2926: URL: https://github.com/apache/hudi/pull/2926#discussion_r639372337 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala ## @@ -131,15 +133,28 @@ class MergeO

[GitHub] [hudi] danny0405 commented on a change in pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
danny0405 commented on a change in pull request #2899: URL: https://github.com/apache/hudi/pull/2899#discussion_r639363503 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineServerHelper.java ## @@ -35,16 +35,22 @@ p

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2926: URL: https://github.com/apache/hudi/pull/2926#discussion_r639362769 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala ## @@ -182,4 +197,98 @@ object MergeO

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] nmahmood630 commented on issue #2987: [SUPPORT] Only able to retrieve last _hoodie_commit_time

2021-05-25 Thread GitBox
nmahmood630 commented on issue #2987: URL: https://github.com/apache/hudi/issues/2987#issuecomment-848397902 I think the issue I am seeing is similar to: https://github.com/apache/hudi/issues/2002 where all the records commit time are getting updated since this table holds aggregation data

[jira] [Created] (HUDI-1934) Update keyGenerator configuration docs

2021-05-25 Thread Xianghu Wang (Jira)
Xianghu Wang created HUDI-1934: -- Summary: Update keyGenerator configuration docs Key: HUDI-1934 URL: https://issues.apache.org/jira/browse/HUDI-1934 Project: Apache Hudi Issue Type: Sub-task

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[jira] [Created] (HUDI-1933) Improvement clustering function, need support aysnc clustering

2021-05-25 Thread taylor liao (Jira)
taylor liao created HUDI-1933: - Summary: Improvement clustering function, need support aysnc clustering Key: HUDI-1933 URL: https://issues.apache.org/jira/browse/HUDI-1933 Project: Apache Hudi I

[GitHub] [hudi] codecov-commenter commented on pull request #2993: [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type

2021-05-25 Thread GitBox
codecov-commenter commented on pull request #2993: URL: https://github.com/apache/hudi/pull/2993#issuecomment-848384059 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache

[GitHub] [hudi] nmahmood630 commented on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source

2021-05-25 Thread GitBox
nmahmood630 commented on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-848381823 Can you also please provide additional information regarding how we 'could achieve this using existing recordPayload with one column to determine source ordering.'? -- This is an

[GitHub] [hudi] jtmzheng opened a new issue #2995: [SUPPORT] Upserts creating duplicates after enabling metadata table in Hudi 0.7 indexing pipeline

2021-05-25 Thread GitBox
jtmzheng opened a new issue #2995: URL: https://github.com/apache/hudi/issues/2995 **Describe the problem you faced** **Background**: We run a Spark Streaming application that ingests messages from Kinesis and upserts/deletes objects from a date-partitioned Hudi 0.6 MOR dataset. Thi

[GitHub] [hudi] nmahmood630 commented on issue #2987: [SUPPORT] Only able to retrieve last _hoodie_commit_time

2021-05-25 Thread GitBox
nmahmood630 commented on issue #2987: URL: https://github.com/apache/hudi/issues/2987#issuecomment-84833 Any update on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] jtmzheng commented on issue #2983: [SUPPORT] Is hoodie.consistency.check.enabled still relevant?

2021-05-25 Thread GitBox
jtmzheng commented on issue #2983: URL: https://github.com/apache/hudi/issues/2983#issuecomment-848289555 We enabled for our Hudi 0.6 dataset since this pre-dated strong consistency on S3. Wanted to confirm we can safely disable this now since I think this adds overhead to ingestion. --

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-829931431 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2902?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-829931431 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2902?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-829931431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-829931431 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2902?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[jira] [Commented] (HUDI-376) AWS Glue dependency issue for EMR 5.28.0

2021-05-25 Thread Purushotham Pushpavanthar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351319#comment-17351319 ] Purushotham Pushpavanthar commented on HUDI-376: [~XingXPan] this is good c

[GitHub] [hudi] umehrot2 commented on a change in pull request #2925: [HUDI-1879] Fix RO Tables Returning Snapshot Result

2021-05-25 Thread GitBox
umehrot2 commented on a change in pull request #2925: URL: https://github.com/apache/hudi/pull/2925#discussion_r639097765 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -194,11 +197,17 @@ private void syncSchema(String tableNa

[GitHub] [hudi] KarthickAN opened a new issue #2066: [SUPPORT] Hudi is increasing the storage size big time

2021-05-25 Thread GitBox
KarthickAN opened a new issue #2066: URL: https://github.com/apache/hudi/issues/2066 Hi, I wanted to understand how much storage overhead hudi is going to add because of its metadata. So I ran a spike with 14GB of raw data and processed it to produce a parquet files. I already have

[GitHub] [hudi] andaag opened a new issue #2135: [SUPPORT] GDPR safe deletes is complex

2021-05-25 Thread GitBox
andaag opened a new issue #2135: URL: https://github.com/apache/hudi/issues/2135 **Describe the problem you faced** I'm trying to come up with a consistent and understandable way to deal with gdpr deletes. What I'd like to do: 1. Stream realtime data into bucket A 2. Col

[GitHub] [hudi] GintokiYs opened a new issue #2513: [SUPPORT]Hive-Cli set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat and query error

2021-05-25 Thread GitBox
GintokiYs opened a new issue #2513: URL: https://github.com/apache/hudi/issues/2513 **Describe the problem you faced** When I insert data through Hudi-Spark and synchronize the data to Hive, I can use Hive-Cli query this cow table and get the data (hudi-hadoop-mr-bundle-0.6.0 has been p

[GitHub] [hudi] duanyongvictory opened a new issue #2482: [SUPPORT]

2021-05-25 Thread GitBox
duanyongvictory opened a new issue #2482: URL: https://github.com/apache/hudi/issues/2482 hudi version: hudi-spark-bundle_2.11-0.5.2-incubating.jar data sample: {"data_version":"123", "p_sn":"3456e", "gender":"女","pix":[{"p_sn":"161"}]} where i use spark to write this dat

[GitHub] [hudi] rshanmugam1 opened a new issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-05-25 Thread GitBox
rshanmugam1 opened a new issue #2609: URL: https://github.com/apache/hudi/issues/2609 **Describe the problem you faced** Presto query performance with hudi table takes ~2x extra time when compared to parquet for simple query . data stored in s3. hudi metadata store enabled. note, s

[GitHub] [hudi] n3nash commented on issue #2924: [SUPPORT]presto query:could not initialize class org.apache.hudi.common.util.HoodieAvroUtils

2021-05-25 Thread GitBox
n3nash commented on issue #2924: URL: https://github.com/apache/hudi/issues/2924#issuecomment-848111233 @root18039532923 I'm not sure why you are getting this exception. Have you added the hudi-hadoop-mr bundle to the classpath of the hive deployment ? -- This is an automated message fro

[GitHub] [hudi] n3nash edited a comment on issue #2955: [SUPPORT]Log system conflict in Hudi-Cli after run temp_* command

2021-05-25 Thread GitBox
n3nash edited a comment on issue #2955: URL: https://github.com/apache/hudi/issues/2955#issuecomment-848104530 @peanut-chenzhong Can you please send a PR to the master branch for us to review ? Please create a corresponding JIRA issue for this as well. -- This is an automated message fr

[GitHub] [hudi] n3nash commented on issue #2955: [SUPPORT]Log system conflict in Hudi-Cli after run temp_* command

2021-05-25 Thread GitBox
n3nash commented on issue #2955: URL: https://github.com/apache/hudi/issues/2955#issuecomment-848104530 @peanut-chenzhong Can you please send a PR to the master branch for us to review ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] vaibhav-sinha commented on pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-25 Thread GitBox
vaibhav-sinha commented on pull request #2923: URL: https://github.com/apache/hudi/pull/2923#issuecomment-848098918 The tests were clean except for one test case failing before which I had fixed. But after merging the latest changes from master, I see a lot of tests failing and the errors

[GitHub] [hudi] pratyakshsharma commented on pull request #2967: Added blog for Hudi cleaner service

2021-05-25 Thread GitBox
pratyakshsharma commented on pull request #2967: URL: https://github.com/apache/hudi/pull/2967#issuecomment-848097336 @n3nash Ack. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [hudi] nmahmood630 edited a comment on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source

2021-05-25 Thread GitBox
nmahmood630 edited a comment on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-848077504 Just seeing the interface definition isn't that helpful. My project is written in python/pyspark and ran on AWS Glue where I'm uploading the Hudi JAR to S3 for the Glue job t

[GitHub] [hudi] vaibhav-sinha commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-25 Thread GitBox
vaibhav-sinha commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r639042519 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java ## @@ -62,7 +62,7 @@ } else if (kvArray[1]

[GitHub] [hudi] nmahmood630 commented on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source

2021-05-25 Thread GitBox
nmahmood630 commented on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-848077504 Just seeing the interface definition isn't that helpful. My project is written in python/pyspark and ran on AWS Glue. How would I include this file to provide my own implementation?

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2899: URL: https://github.com/apache/hudi/pull/2899#issuecomment-829195458 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2899?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2899: URL: https://github.com/apache/hudi/pull/2899#issuecomment-829195458 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2899?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] vinothchandar commented on pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
vinothchandar commented on pull request #2899: URL: https://github.com/apache/hudi/pull/2899#issuecomment-848056191 @danny0405 please take a look. Once CI passes, we can land. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] vinothchandar commented on a change in pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
vinothchandar commented on a change in pull request #2899: URL: https://github.com/apache/hudi/pull/2899#discussion_r638986316 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineServerHelper.java ## @@ -35,16 +35,22 @@

[GitHub] [hudi] vinothchandar commented on pull request #2899: [HUDI-1865] Make embedded time line service singleton

2021-05-25 Thread GitBox
vinothchandar commented on pull request #2899: URL: https://github.com/apache/hudi/pull/2899#issuecomment-848040019 @danny0405 we left this hanging a bit. Let me re-review this and get it landing in some form. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] vinothchandar commented on pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-05-25 Thread GitBox
vinothchandar commented on pull request #2496: URL: https://github.com/apache/hudi/pull/2496#issuecomment-848039122 I have not been able to test this on S3. let me pick it up later next week. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[hudi] branch master updated (afa6bc0 -> 112732d)

2021-05-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from afa6bc0 [HUDI-1723] Fix path selector listing files with the same mod date (#2845) add 112732d [HUDI-1922] Bulk

[GitHub] [hudi] vinothchandar merged pull request #2981: [HUDI-1922] Bulk insert with row writer supports mor table

2021-05-25 Thread GitBox
vinothchandar merged pull request #2981: URL: https://github.com/apache/hudi/pull/2981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, ple

[jira] [Created] (HUDI-1932) Hive Sync should not always update last_commit_time_sync

2021-05-25 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-1932: Summary: Hive Sync should not always update last_commit_time_sync Key: HUDI-1932 URL: https://issues.apache.org/jira/browse/HUDI-1932 Project: Apache Hudi Issue Type

[GitHub] [hudi] codecov-commenter commented on pull request #2994: Hudi 1931

2021-05-25 Thread GitBox
codecov-commenter commented on pull request #2994: URL: https://github.com/apache/hudi/pull/2994#issuecomment-848024173 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2994?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache

[GitHub] [hudi] vinothchandar commented on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops

2021-05-25 Thread GitBox
vinothchandar commented on pull request #2388: URL: https://github.com/apache/hudi/pull/2388#issuecomment-848022526 @n3nash @satishkotha Any updates on this? generally love to get these follow ups from clustering over the fence if we can -- This is an automated message from the Apache Gi

[GitHub] [hudi] vinothchandar commented on pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

2021-05-25 Thread GitBox
vinothchandar commented on pull request #2378: URL: https://github.com/apache/hudi/pull/2378#issuecomment-848021692 #2926 overlaps with this? @yui2010 , @pengzhiwei2018 any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

2021-05-25 Thread GitBox
vinothchandar commented on a change in pull request #2903: URL: https://github.com/apache/hudi/pull/2903#discussion_r638949244 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -105,7 +105,9 @@ class DefaultSource extends R

[GitHub] [hudi] vinothchandar commented on a change in pull request #2926: [HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table

2021-05-25 Thread GitBox
vinothchandar commented on a change in pull request #2926: URL: https://github.com/apache/hudi/pull/2926#discussion_r638929143 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala ## @@ -131,15 +133,28 @@ class MergeOn

[GitHub] [hudi] loukey-lj opened a new pull request #2994: Hudi 1931

2021-05-25 Thread GitBox
loukey-lj opened a new pull request #2994: URL: https://github.com/apache/hudi/pull/2994 org.apache.hudi.sink.partitioner.BucketAssignFunction#partitionLoadState and org.apache.hudi.sink.partitioner.BucketAssignFunction#indexState use wrong state, RowDataToHoodieFunction was keyby r

[GitHub] [hudi] n3nash commented on issue #2975: [SUPPORT] Read record using index

2021-05-25 Thread GitBox
n3nash commented on issue #2975: URL: https://github.com/apache/hudi/issues/2975#issuecomment-847968028 @fanaticjo Can you help @calleo since you recently implemented a custom recordpayload while using pyspark ? -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2977: [HUDI-1763] Fixing honoring of Ordering val in DefaultHoodieRecordPayload.preCombine

2021-05-25 Thread GitBox
codecov-commenter edited a comment on pull request #2977: URL: https://github.com/apache/hudi/pull/2977#issuecomment-846016572 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2977?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The

[GitHub] [hudi] nsivabalan edited a comment on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
nsivabalan edited a comment on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-847918561 Can you check why CI is failing. we can land once fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Reopened] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-05-25 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reopened HUDI-1723: --- > DFSPathSelector skips files with the same modify date when read up to source > limit >

[jira] [Updated] (HUDI-1763) DefaultHoodieRecordPayload does not honor ordering value when records within multiple log files are merged

2021-05-25 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1763: -- Status: Patch Available (was: In Progress) > DefaultHoodieRecordPayload does not honor

[jira] [Updated] (HUDI-1763) DefaultHoodieRecordPayload does not honor ordering value when records within multiple log files are merged

2021-05-25 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1763: -- Status: In Progress (was: Open) > DefaultHoodieRecordPayload does not honor ordering va

[jira] [Resolved] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-05-25 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1723. --- Resolution: Fixed > DFSPathSelector skips files with the same modify date when read up

[jira] [Updated] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-05-25 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1723: -- Status: Closed (was: Patch Available) > DFSPathSelector skips files with the same modif

[GitHub] [hudi] nsivabalan commented on pull request #2310: [HUDI-1444] fix rollback for emtpy partition table

2021-05-25 Thread GitBox
nsivabalan commented on pull request #2310: URL: https://github.com/apache/hudi/pull/2310#issuecomment-847950023 yeah, looks like it. closing it for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-25 Thread GitBox
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r638867446 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java ## @@ -62,7 +62,7 @@ } else if (kvArray[1].eq

[GitHub] [hudi] sbernauer edited a comment on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2021-05-25 Thread GitBox
sbernauer edited a comment on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-847940734 Hi @nsivabalan, we have multiple schema versions of the events we consume. We use kafka and Confluent Schema Registry. I think all the events in kafka are written wi

[GitHub] [hudi] sbernauer commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2021-05-25 Thread GitBox
sbernauer commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-847940734 Hi @nsivabalan, we have multiple schema versions of the events we consume. We use kafka and Confluent Schema Registry. I think all the events in kafka are written with sch

[GitHub] [hudi] nsivabalan commented on pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-25 Thread GitBox
nsivabalan commented on pull request #2923: URL: https://github.com/apache/hudi/pull/2923#issuecomment-847933890 Can you check CI failure please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] nsivabalan commented on pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

2021-05-25 Thread GitBox
nsivabalan commented on pull request #2902: URL: https://github.com/apache/hudi/pull/2902#issuecomment-847918561 Can you check why CI is failing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (HUDI-1668) GlobalSortPartitioner is getting called twice during bulk_insert.

2021-05-25 Thread Sugamber (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351095#comment-17351095 ] Sugamber commented on HUDI-1668: [~nishith29] Yes, We can close this. Thank you!!! > Glo

[hudi] branch master updated: [HUDI-1723] Fix path selector listing files with the same mod date (#2845)

2021-05-25 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new afa6bc0 [HUDI-1723] Fix path selector listing

[GitHub] [hudi] nsivabalan merged pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-05-25 Thread GitBox
nsivabalan merged pull request #2845: URL: https://github.com/apache/hudi/pull/2845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
nsivabalan edited a comment on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578 @ayush71994 : 1. May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configuratio

[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
nsivabalan edited a comment on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578 @ayush71994 : 1. May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configuratio

[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
nsivabalan edited a comment on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578 @ayush71994 : May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html

[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

2021-05-25 Thread GitBox
nsivabalan commented on issue #2992: URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578 @ayush71994 : May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html And

[jira] [Created] (HUDI-1931) BucketAssignFunction use wrong state

2021-05-25 Thread loukey_j (Jira)
loukey_j created HUDI-1931: -- Summary: BucketAssignFunction use wrong state Key: HUDI-1931 URL: https://issues.apache.org/jira/browse/HUDI-1931 Project: Apache Hudi Issue Type: Improvement

  1   2   >