[GitHub] [hudi] codope commented on pull request #4591: Revert "[HUDI-3233] Make metadata commit synchronous for flink batch"

2022-01-14 Thread GitBox
codope commented on pull request #4591: URL: https://github.com/apache/hudi/pull/4591#issuecomment-1013631866 > @codope : was this an attempt to fix the flakiness ? Yeah, closing this PR. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] codope closed pull request #4591: Revert "[HUDI-3233] Make metadata commit synchronous for flink batch"

2022-01-14 Thread GitBox
codope closed pull request #4591: URL: https://github.com/apache/hudi/pull/4591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013621545 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013628091 ## CI report: * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure:

[jira] [Updated] (HUDI-2597) Improve code quality around Generics with Java 8

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2597: Priority: Blocker (was: Major) > Improve code quality around Generics with Java 8 >

[jira] [Updated] (HUDI-2596) Make class names consistent in hudi-client

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2596: Priority: Blocker (was: Major) > Make class names consistent in hudi-client >

[jira] [Updated] (HUDI-2598) Redesign record payload class to decouple HoodieRecordPayload from Avro

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2598: Priority: Blocker (was: Critical) > Redesign record payload class to decouple HoodieRecordPayload from

[jira] [Updated] (HUDI-3042) Refactor clustering action in hudi-client module to use HoodieData abstraction

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3042: Priority: Blocker (was: Major) > Refactor clustering action in hudi-client module to use HoodieData

[jira] [Updated] (HUDI-2638) Rewrite tests around Hudi index

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2638: Priority: Blocker (was: Major) > Rewrite tests around Hudi index > --- > >

[jira] [Updated] (HUDI-2439) Refactor table.action.commit package (CommitActionExecutors) in hudi-client module

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2439: Priority: Blocker (was: Major) > Refactor table.action.commit package (CommitActionExecutors) in

[jira] [Updated] (HUDI-2656) Generalize HoodieIndex for flexible record data type

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2656: Priority: Blocker (was: Major) > Generalize HoodieIndex for flexible record data type >

[jira] [Resolved] (HUDI-752) Make CompactionAdminClient spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-752. > Make CompactionAdminClient spark-free > - > > Key:

[jira] [Commented] (HUDI-752) Make CompactionAdminClient spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476551#comment-17476551 ] Ethan Guo commented on HUDI-752: This is resolved by using HoodieEngineContext. > Make

[jira] [Commented] (HUDI-750) Make AbstractHoodieClient spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476550#comment-17476550 ] Ethan Guo commented on HUDI-750: This is resolved in [https://github.com/apache/hudi/pull/1827.]  > Make

[jira] [Resolved] (HUDI-750) Make AbstractHoodieClient spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-750. > Make AbstractHoodieClient spark-free > > > Key: HUDI-750

[jira] [Resolved] (HUDI-729) Replace JavaSparkContext/SQLContext with SparkSession

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-729. > Replace JavaSparkContext/SQLContext with SparkSession > -

[jira] [Commented] (HUDI-729) Replace JavaSparkContext/SQLContext with SparkSession

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476549#comment-17476549 ] Ethan Guo commented on HUDI-729: It looks like this is resolved on latest master.  [~lamber-ken] Please

[jira] [Resolved] (HUDI-682) Move HoodieReadClient into hudi-spark module

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-682. > Move HoodieReadClient into hudi-spark module > > >

[jira] [Commented] (HUDI-682) Move HoodieReadClient into hudi-spark module

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476548#comment-17476548 ] Ethan Guo commented on HUDI-682: This is done in [https://github.com/apache/hudi/pull/1827.] > Move

[jira] [Commented] (HUDI-661) Make EmbeddedTimelineService spark free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476547#comment-17476547 ] Ethan Guo commented on HUDI-661: EmbeddedTimelineService is engine-agnostic after the refactoring.  Closing

[jira] [Resolved] (HUDI-661) Make EmbeddedTimelineService spark free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-661. > Make EmbeddedTimelineService spark free > --- > > Key:

[jira] [Resolved] (HUDI-659) Make HoodieCommitArchiveLog spark free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-659. > Make HoodieCommitArchiveLog spark free > -- > > Key:

[jira] [Commented] (HUDI-659) Make HoodieCommitArchiveLog spark free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476546#comment-17476546 ] Ethan Guo commented on HUDI-659: HoodieTimelineArchiveLog (new class name for HoodieCommitArchiveLog) is

[jira] [Resolved] (HUDI-658) Make ClientUtils spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo resolved HUDI-658. > Make ClientUtils spark-free > --- > > Key: HUDI-658 >

[jira] [Commented] (HUDI-658) Make ClientUtils spark-free

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476545#comment-17476545 ] Ethan Guo commented on HUDI-658: This has already been resolved on latest master (ClientUtils no longer

[GitHub] [hudi] xushiyan commented on issue #4597: [SUPPORT] - Hudi Upserts Not working

2022-01-14 Thread GitBox
xushiyan commented on issue #4597: URL: https://github.com/apache/hudi/issues/4597#issuecomment-1013623902 @harishraju-govindaraju as suggested above, this looks like unintended use of indexing type. Can you use GLOBAL_BLOOM setting instead? We should be good to close this as soon as you

[jira] [Updated] (HUDI-538) [UMBRELLA] Restructuring hudi client module for multi engine support

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-538: --- Priority: Blocker (was: Major) > [UMBRELLA] Restructuring hudi client module for multi engine support >

[jira] [Updated] (HUDI-538) [UMBRELLA] Restructuring hudi client module for multi engine support

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-538: --- Fix Version/s: 0.11.0 > [UMBRELLA] Restructuring hudi client module for multi engine support >

[GitHub] [hudi] xushiyan commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong

2022-01-14 Thread GitBox
xushiyan commented on issue #4600: URL: https://github.com/apache/hudi/issues/4600#issuecomment-1013622766 @gubinjie we need more info about your environment setup to analyze and reproduce. Can you add info like hudi version, environment, other software versions, how table was prepared,

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013621545 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013614454 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] XuQianJin-Stars commented on issue #3984: [SUPPORT] Upgrade from 0.8.0 to 0.9.0 removes functionality and decreases performance

2022-01-14 Thread GitBox
XuQianJin-Stars commented on issue #3984: URL: https://github.com/apache/hudi/issues/3984#issuecomment-1013615169 > > hi @cb149 @nsivabalan @xushiyan I have found this problem, just need to `set hoodie.file.index.enable=false` to work > > ``` > > val tripsSnapshotDF =

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013614454 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013614115 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013614115 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013613746 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013613746 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013613381 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot

[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013613381 ## CI report: * 067b1741e59b23260e711e8e7275430c59552459 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure`

[GitHub] [hudi] XuQianJin-Stars opened a new pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-14 Thread GitBox
XuQianJin-Stars opened a new pull request #4607: URL: https://github.com/apache/hudi/pull/4607 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is

[jira] [Updated] (HUDI-3242) Checkpoint 0 is ignored -Partial parquet file discovery after the first commit

2022-01-14 Thread Harsha Teja Kanna (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsha Teja Kanna updated HUDI-3242: Priority: Critical (was: Blocker) > Checkpoint 0 is ignored -Partial parquet file

[GitHub] [hudi] hudi-bot commented on pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4606: URL: https://github.com/apache/hudi/pull/4606#issuecomment-1013607661 ## CI report: * 80899c440c8c1b0d14b8d80a4f3de9ea87d0b8d4 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4606: URL: https://github.com/apache/hudi/pull/4606#issuecomment-1013598133 ## CI report: * 80899c440c8c1b0d14b8d80a4f3de9ea87d0b8d4 Azure:

[GitHub] [hudi] alexeykudinkin edited a comment on pull request #4551: [HUDI-3010] Unbundle parquet-avro and shade hbase in presto-bundle

2022-01-14 Thread GitBox
alexeykudinkin edited a comment on pull request #4551: URL: https://github.com/apache/hudi/pull/4551#issuecomment-1013598939 That makes total sense to me. But for that we have to update the Docker images we're using in ITs, right? If'd revert those changes my PR would have ITs failing b/c

[jira] [Created] (HUDI-3250) Upgrade Presto version in docker setup and integ test

2022-01-14 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-3250: - Summary: Upgrade Presto version in docker setup and integ test Key: HUDI-3250 URL: https://issues.apache.org/jira/browse/HUDI-3250 Project: Apache Hudi Issue

[GitHub] [hudi] alexeykudinkin commented on pull request #4551: [HUDI-3010] Unbundle parquet-avro and shade hbase in presto-bundle

2022-01-14 Thread GitBox
alexeykudinkin commented on pull request #4551: URL: https://github.com/apache/hudi/pull/4551#issuecomment-1013598939 That makes total sense to me. But for that we have to update the Docker images we're using in ITs, right? If'd revert those changes my PR would have ITs failing b/c of

[jira] [Updated] (HUDI-2872) Enable data skipping index even for sort based clustering

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2872: -- Reviewers: Y Ethan Guo > Enable data skipping index even for sort based clustering >

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Reviewers: Y Ethan Guo > Unify configurations for clustering execution strategy and layout

[GitHub] [hudi] hudi-bot commented on pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4606: URL: https://github.com/apache/hudi/pull/4606#issuecomment-1013598133 ## CI report: * 80899c440c8c1b0d14b8d80a4f3de9ea87d0b8d4 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4606: URL: https://github.com/apache/hudi/pull/4606#issuecomment-1013597581 ## CI report: * 80899c440c8c1b0d14b8d80a4f3de9ea87d0b8d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Sprint: Hudi-Sprint-Jan-10 > Unify configurations for clustering execution strategy and layout

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Fix Version/s: 0.11.0 > Unify configurations for clustering execution strategy and layout

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Priority: Blocker (was: Major) > Unify configurations for clustering execution strategy and

[jira] [Updated] (HUDI-2872) Enable data skipping index even for sort based clustering

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2872: -- Sprint: Hudi-Sprint-Jan-10 > Enable data skipping index even for sort based clustering >

[GitHub] [hudi] hudi-bot commented on pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4606: URL: https://github.com/apache/hudi/pull/4606#issuecomment-1013597581 ## CI report: * 80899c440c8c1b0d14b8d80a4f3de9ea87d0b8d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure`

[jira] [Updated] (HUDI-2872) Enable data skipping index even for sort based clustering

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2872: -- Status: Patch Available (was: In Progress) > Enable data skipping index even for sort based

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Status: In Progress (was: Open) > Unify configurations for clustering execution strategy and

[jira] [Assigned] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-2646: - Assignee: Alexey Kudinkin > Unify configurations for clustering execution strategy and

[jira] [Updated] (HUDI-2646) Unify configurations for clustering execution strategy and layout optimization

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2646: -- Status: Patch Available (was: In Progress) > Unify configurations for clustering execution

[jira] [Updated] (HUDI-2872) Enable data skipping index even for sort based clustering

2022-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2872: - Labels: pull-request-available (was: ) > Enable data skipping index even for sort based

[GitHub] [hudi] alexeykudinkin opened a new pull request #4606: [HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) flow to support linear ordering

2022-01-14 Thread GitBox
alexeykudinkin opened a new pull request #4606: URL: https://github.com/apache/hudi/pull/4606 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the

[GitHub] [hudi] codope commented on pull request #4551: [HUDI-3010] Unbundle parquet-avro and shade hbase in presto-bundle

2022-01-14 Thread GitBox
codope commented on pull request #4551: URL: https://github.com/apache/hudi/pull/4551#issuecomment-1013594138 > @codope i had to revert these changes in my PR, since Presto queries are failing after rebase: > > ``` > 2022-01-14T20:45:04.265Z WARNhive-hive-0

[GitHub] [hudi] scxwhite commented on a change in pull request #4400: [HUDI-3069] compact improve

2022-01-14 Thread GitBox
scxwhite commented on a change in pull request #4400: URL: https://github.com/apache/hudi/pull/4400#discussion_r785259770 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java ## @@ -264,8 +264,11 @@

[GitHub] [hudi] scxwhite commented on a change in pull request #4400: [HUDI-3069] compact improve

2022-01-14 Thread GitBox
scxwhite commented on a change in pull request #4400: URL: https://github.com/apache/hudi/pull/4400#discussion_r785259289 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java ## @@ -264,8 +264,11 @@

[GitHub] [hudi] scxwhite commented on a change in pull request #4400: [HUDI-3069] compact improve

2022-01-14 Thread GitBox
scxwhite commented on a change in pull request #4400: URL: https://github.com/apache/hudi/pull/4400#discussion_r785259224 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java ## @@ -264,8 +264,11 @@

[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013578806 ## CI report: * 3d6e3e70a7c3c1bb0b1b9d9e2945bc1dcdc1da5a Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013560358 ## CI report: * ce8a8d9547819b23368115ba640caed1cb385213 Azure:

[jira] [Commented] (HUDI-1576) Add ability to perform archival synchronously

2022-01-14 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476506#comment-17476506 ] Nishith Agarwal commented on HUDI-1576: --- [~guoyihua] Yes, the idea was to detach archiving from

[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013576099 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * 7798caf61854f0789bfdae4fa542ef1b0b6008a6 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013557149 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * a70ea22d06f2b91b0c7e005e3db3c4d3faaf1d75 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013554187 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013567395 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c

[GitHub] [hudi] manojpec commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
manojpec commented on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013560479 @prashantwason @vinothchandar After discussions, made HoodieHFileReader the single source of truth for all HFile schema related fields. HFileReader already tracks other

[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013560358 ## CI report: * ce8a8d9547819b23368115ba640caed1cb385213 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013559504 ## CI report: * ce8a8d9547819b23368115ba640caed1cb385213 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1013559504 ## CI report: * ce8a8d9547819b23368115ba640caed1cb385213 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008518397 ## CI report: * ce8a8d9547819b23368115ba640caed1cb385213 Azure:

[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
manojpec commented on a change in pull request #4449: URL: https://github.com/apache/hudi/pull/4449#discussion_r785240782 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java ## @@ -62,6 +64,7 @@ // Scanner used to read individual keys.

[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
manojpec commented on a change in pull request #4449: URL: https://github.com/apache/hudi/pull/4449#discussion_r785240751 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java ## @@ -122,9 +118,9 @@ public HoodieLogBlockType

[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
manojpec commented on a change in pull request #4449: URL: https://github.com/apache/hudi/pull/4449#discussion_r785240599 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java ## @@ -83,6 +84,11 @@

[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-14 Thread GitBox
manojpec commented on a change in pull request #4449: URL: https://github.com/apache/hudi/pull/4449#discussion_r785240580 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java ## @@ -83,6 +84,11 @@

[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013557149 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * a70ea22d06f2b91b0c7e005e3db3c4d3faaf1d75 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013537449 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * a70ea22d06f2b91b0c7e005e3db3c4d3faaf1d75 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013537426 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013554187 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c

[jira] [Updated] (HUDI-3179) Extract common Hudi Table File Index implementation

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3179: -- Reviewers: Vinoth Chandar, Y Ethan Guo > Extract common Hudi Table File Index implementation >

[jira] [Updated] (HUDI-3206) Unify Hive's MOR `InputFormat` implementations (Parquet, HFile)

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3206: -- Reviewers: Vinoth Chandar, Y Ethan Guo > Unify Hive's MOR `InputFormat` implementations

[jira] [Updated] (HUDI-3191) Rebase Hive's FileInputFormat onto AbstractHoodieTableFileIndex

2022-01-14 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3191: -- Reviewers: Vinoth Chandar, Y Ethan Guo > Rebase Hive's FileInputFormat onto

[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013471849 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * a70ea22d06f2b91b0c7e005e3db3c4d3faaf1d75 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013537426 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c

[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1013537449 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * a70ea22d06f2b91b0c7e005e3db3c4d3faaf1d75 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4531] Removing duplicating file-listing process w/in Hive's MOR `FIleInputFormat`s

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1013451270 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #4531: [HUDI-3191][Stacked on 4520] Rebasing Hive's FileInputFormat onto `AbstractHoodieTableFileIndex`

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4531: URL: https://github.com/apache/hudi/pull/4531#issuecomment-1013535231 ## CI report: * d8313ea2d2d4e98e35214573022aa7562936a166 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4531: [HUDI-3191][Stacked on 4520] Rebasing Hive's FileInputFormat onto `AbstractHoodieTableFileIndex`

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4531: URL: https://github.com/apache/hudi/pull/4531#issuecomment-1013510325 ## CI report: * 29076ce4ae979c5452de3af38d4535fb3768471d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1013531885 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 9743a6f98e62888c5e9cb575dd8cf0d38ade0319 Azure:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1013505801 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 46d45878e535f9517421a49958ebf13c54c6cca3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4516: [HUDI-3181][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-14 Thread GitBox
hudi-bot commented on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1013529426 ## CI report: * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * a35de627f6cfdd75200371d41960901d7bbfefb1 UNKNOWN * 1f4165928c3204c00f69ac90617c92184d991cdf

[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [HUDI-3181][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-14 Thread GitBox
hudi-bot removed a comment on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1009494630 ## CI report: * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * a35de627f6cfdd75200371d41960901d7bbfefb1 UNKNOWN *

[GitHub] [hudi] umehrot2 commented on pull request #1946: [HUDI-1176]Upgrade tp log4j2

2022-01-14 Thread GitBox
umehrot2 commented on pull request #1946: URL: https://github.com/apache/hudi/pull/1946#issuecomment-1013527978 @hddong are you still going to work on this ? If you don't have the bandwidth, I would be happy to drive this to completion and upgrade this further to Log4j 2.17.1 which

[jira] [Updated] (HUDI-1629) Change partitioner abstraction to implement multiple strategies

2022-01-14 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-1629: Status: In Progress (was: Open) > Change partitioner abstraction to implement multiple strategies >

  1   2   3   4   >