[jira] [Updated] (HUDI-3881) Implement index syntax for spark sql

2022-04-16 Thread Forward Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forward Xu updated HUDI-3881: - Description: {code:java} 1.create index CREATE INDEX [IF NOT EXISTS] index_name ON TABLE [db_name.]table_n

[jira] [Updated] (HUDI-3892) Add HoodieReadClient with java

2022-04-16 Thread Forward Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forward Xu updated HUDI-3892: - Description: We might need a hoodie read client in java similar to the one we have for spark.  [Apache P

[jira] [Assigned] (HUDI-3877) Support Java reader for hudi

2022-04-16 Thread Forward Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forward Xu reassigned HUDI-3877: Assignee: (was: Forward Xu) > Support Java reader for hudi > > >

[jira] [Assigned] (HUDI-3877) Support Java reader for hudi

2022-04-16 Thread Forward Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forward Xu reassigned HUDI-3877: Assignee: Forward Xu > Support Java reader for hudi > > >

[jira] [Updated] (HUDI-3897) Drop scala 2.11 artifacts

2022-04-16 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3897: - Description: To reduce complexity in the artifacts. Use scala 12 for all spark bundles Use scala-free fl

[jira] [Updated] (HUDI-3897) Drop scala 2.11 artifacts

2022-04-16 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3897: - Description: To reduce complexity in the artifacts. Use scala 12 for all spark bundles Use scala-free fl

[jira] [Created] (HUDI-3897) Drop scala 2.11 artifacts

2022-04-16 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3897: Summary: Drop scala 2.11 artifacts Key: HUDI-3897 URL: https://issues.apache.org/jira/browse/HUDI-3897 Project: Apache Hudi Issue Type: Task Components: de

[GitHub] [hudi] simonsssu commented on issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi?

2022-04-16 Thread GitBox
simonsssu commented on issue #5313: URL: https://github.com/apache/hudi/issues/5313#issuecomment-1100803165 @nsivabalan hi nsivabalan, my jira id is HUDI-3877, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [hudi] hudi-bot commented on pull request #5338: [WIP][HUDI-3894] Fix datahub and gcp bundles to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100757479 ## CI report: * 5cd21a598d455492412ea525feddaa53325b5c9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8087

[jira] [Commented] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523216#comment-17523216 ] Alexey Kudinkin commented on HUDI-3891: --- So the root-cause of this discrepancy in th

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Description: While benchmarking querying raw Parquet tables against Hudi tables, i've run the t

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Attachment: image-2022-04-16-13-50-43-916.png > Investigate Hudi vs Raw Parquet table discrepanc

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Attachment: image-2022-04-16-13-50-43-956.png > Investigate Hudi vs Raw Parquet table discrepanc

[jira] [Updated] (HUDI-3896) Support Spark optimizations for `HadoopFsRelation`

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3896: -- Description: After migrating to Hudi's own Relation impls, we unfortunately broke off some of t

[jira] [Updated] (HUDI-3896) Support Spark optimizations for `HadoopFsRelation`

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3896: -- Attachment: Screen Shot 2022-04-16 at 1.46.50 PM.png > Support Spark optimizations for `HadoopFs

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Issue Type: Task (was: Bug) > Investigate Hudi vs Raw Parquet table discrepancy > -

[jira] [Created] (HUDI-3896) Support Spark optimizations for `HadoopFsRelation`

2022-04-16 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-3896: - Summary: Support Spark optimizations for `HadoopFsRelation` Key: HUDI-3896 URL: https://issues.apache.org/jira/browse/HUDI-3896 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-3895) Make sure Hudi relations do proper file-split packing (on par w/ Spark)

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3895: -- Status: Patch Available (was: In Progress) > Make sure Hudi relations do proper file-split pack

[jira] [Updated] (HUDI-3895) Make sure Hudi relations do proper file-split packing (on par w/ Spark)

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3895: -- Status: In Progress (was: Open) > Make sure Hudi relations do proper file-split packing (on par

[jira] [Updated] (HUDI-3895) Make sure Hudi relations do proper file-split packing (on par w/ Spark)

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3895: -- Sprint: Hudi-Sprint-Apr-12 > Make sure Hudi relations do proper file-split packing (on par w/ Sp

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Priority: Blocker (was: Critical) > Investigate Hudi vs Raw Parquet table discrepancy > ---

[jira] [Closed] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-3738. - Resolution: Fixed > Perf comparison between parquet and hudi for COW snapshot and MOR read > opti

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Fix Version/s: 0.11.0 > Investigate Hudi vs Raw Parquet table discrepancy >

[jira] [Resolved] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin resolved HUDI-3738. --- > Perf comparison between parquet and hudi for COW snapshot and MOR read > optimized > --

[jira] [Closed] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-3891. - Resolution: Fixed > Investigate Hudi vs Raw Parquet table discrepancy > --

[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3891: -- Sprint: Hudi-Sprint-Apr-12 > Investigate Hudi vs Raw Parquet table discrepancy > ---

[jira] [Updated] (HUDI-3895) Make sure Hudi relations do proper file-split packing (on par w/ Spark)

2022-04-16 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3895: -- Epic Link: HUDI-1297 > Make sure Hudi relations do proper file-split packing (on par w/ Spark) >

[jira] [Created] (HUDI-3895) Make sure Hudi relations do proper file-split packing (on par w/ Spark)

2022-04-16 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-3895: - Summary: Make sure Hudi relations do proper file-split packing (on par w/ Spark) Key: HUDI-3895 URL: https://issues.apache.org/jira/browse/HUDI-3895 Project: Apache

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-16 Thread GitBox
alexeykudinkin commented on code in PR #5337: URL: https://github.com/apache/hudi/pull/5337#discussion_r851667284 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala: ## @@ -84,21 +84,24 @@ class BaseFileOnlyRelation(sqlContext: S

[GitHub] [hudi] hudi-bot commented on pull request #5338: [WIP][HUDI-3894] Fix datahub and gcp bundles to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100745630 ## CI report: * d619e9656c39f88c5e5c07f4d2b01baaf3e8c64a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=808

[GitHub] [hudi] hudi-bot commented on pull request #5338: [HUDI-3894] Fix datahub and gcp bundles to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100735491 ## CI report: * d619e9656c39f88c5e5c07f4d2b01baaf3e8c64a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=808

[GitHub] [hudi] hudi-bot commented on pull request #5338: [HUDI-3894] Fix datahub and gcp bundles to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100735060 ## CI report: * d619e9656c39f88c5e5c07f4d2b01baaf3e8c64a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8086

[GitHub] [hudi] hudi-bot commented on pull request #5338: [HUDI-3894] Fix hudi-datahub-sync-bundle to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100734574 ## CI report: * d619e9656c39f88c5e5c07f4d2b01baaf3e8c64a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8086

[GitHub] [hudi] hudi-bot commented on pull request #5338: [HUDI-3894] Fix hudi-datahub-sync-bundle to include HBase dependencies and shading

2022-04-16 Thread GitBox
hudi-bot commented on PR #5338: URL: https://github.com/apache/hudi/pull/5338#issuecomment-1100733740 ## CI report: * d619e9656c39f88c5e5c07f4d2b01baaf3e8c64a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-3894) Add HBase dependencies and shading in datahub and gcp bundles

2022-04-16 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3894: Summary: Add HBase dependencies and shading in datahub and gcp bundles (was: Add HBase dependencies and sha

[jira] [Updated] (HUDI-3894) Add HBase dependencies and shading in datahub-sync-bundle

2022-04-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3894: - Labels: pull-request-available (was: ) > Add HBase dependencies and shading in datahub-sync-bundl

[GitHub] [hudi] yihua opened a new pull request, #5338: [HUDI-3894] Fix hudi-datahub-sync-bundle to include HBase dependencies and shading

2022-04-16 Thread GitBox
yihua opened a new pull request, #5338: URL: https://github.com/apache/hudi/pull/5338 ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle

[GitHub] [hudi] babumahesh-koo commented on issue #5198: [SUPPORT] Querying data genereated by TimestampBasedKeyGenerator failed to parse timestamp in EPOCHMILLISECONDS column to date format

2022-04-16 Thread GitBox
babumahesh-koo commented on issue #5198: URL: https://github.com/apache/hudi/issues/5198#issuecomment-1100731242 @nsivabalan Without Timestamp based key gen, it works. The observation is that, as long as the extracted values data types are matching with original column data type it w

[jira] [Created] (HUDI-3894) Add HBase dependencies and shading in datahub-sync-bundle

2022-04-16 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-3894: --- Summary: Add HBase dependencies and shading in datahub-sync-bundle Key: HUDI-3894 URL: https://issues.apache.org/jira/browse/HUDI-3894 Project: Apache Hudi Issue Type:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-16 Thread GitBox
nsivabalan commented on code in PR #5337: URL: https://github.com/apache/hudi/pull/5337#discussion_r851655246 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala: ## @@ -84,21 +84,24 @@ class BaseFileOnlyRelation(sqlContext: SQLCo

[GitHub] [hudi] kasured commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-16 Thread GitBox
kasured commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1100717281 @nsivabalan Sure, let me provide more details. There is a StreamingQuery entity which s started by Spark to consume the stream. This is basically what we use and described here https:/

[GitHub] [hudi] yihua commented on a diff in pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-16 Thread GitBox
yihua commented on code in PR #5337: URL: https://github.com/apache/hudi/pull/5337#discussion_r851649669 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala: ## @@ -84,21 +84,24 @@ class BaseFileOnlyRelation(sqlContext: SQLContext

[GitHub] [hudi] wxplovecc commented on issue #5330: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2022-04-16 Thread GitBox
wxplovecc commented on issue #5330: URL: https://github.com/apache/hudi/issues/5330#issuecomment-1100693250 see https://github.com/apache/hudi/pull/5185 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] nsivabalan commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-16 Thread GitBox
nsivabalan commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1100687324 btw, we did fix an issue wrt how spark lazy initialization and caching of results could result in wrong files in commit metadata https://github.com/apache/hudi/pull/4753. looks like

[GitHub] [hudi] nsivabalan commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-16 Thread GitBox
nsivabalan commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1100686361 yes, I really appreciate your digging in deeper. let me try to understand the concurrency here. what do you mean by multiple concurrent streaming writes? there are 3 streams r

[GitHub] [hudi] nsivabalan commented on issue #5253: Hudi execution plan not generated properly [SUPPORT]

2022-04-16 Thread GitBox
nsivabalan commented on issue #5253: URL: https://github.com/apache/hudi/issues/5253#issuecomment-1100684177 @YannByron @XuQianJin-Stars : can either of you folks please chime in here when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the me

[jira] [Commented] (HUDI-3893) Add support to refresh hoodie.properties at regular intervals

2022-04-16 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523120#comment-17523120 ] sivabalan narayanan commented on HUDI-3893: --- suggested to user   do you think

[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

2022-04-16 Thread GitBox
nsivabalan commented on issue #5281: URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100683830 do you think you can add a lambda or something for following ``` touch ${HUDI_TABLE_PATH}/.hoodie/hoodie.properties ``` and that should solve the problem right? -- Thi

[jira] [Updated] (HUDI-3893) Add support to refresh hoodie.properties at regular intervals

2022-04-16 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3893: -- Fix Version/s: 0.12.0 > Add support to refresh hoodie.properties at regular intervals >

[jira] [Created] (HUDI-3893) Add support to refresh hoodie.properties at regular intervals

2022-04-16 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3893: - Summary: Add support to refresh hoodie.properties at regular intervals Key: HUDI-3893 URL: https://issues.apache.org/jira/browse/HUDI-3893 Project: Apache H

[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

2022-04-16 Thread GitBox
nsivabalan commented on issue #5281: URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100682574 have filed a tracking ticket https://issues.apache.org/jira/browse/HUDI-3893 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[jira] [Updated] (HUDI-3835) Add UT for delete in java client

2022-04-16 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3835: - Fix Version/s: 0.12.0 > Add UT for delete in java client > > >

[GitHub] [hudi] kasured commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-16 Thread GitBox
kasured commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1100619414 @nsivabalan Thank you for looking into that. I have updated the configuration in the description as it was a little out of date. Since the creation of the ticket you can see that I have