[GitHub] [hudi] yanghua merged pull request #2572: [DOCS] UPSERT = no duplicates

2021-02-12 Thread GitBox
yanghua merged pull request #2572: URL: https://github.com/apache/hudi/pull/2572 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch asf-site updated: [MINOR][DOCS] Add more description for UPSERT operation(no duplicates) (#2572)

2021-02-12 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 75edfa7 [MINOR][DOCS] Add more description

[GitHub] [hudi] yanghua commented on pull request #2568: [MINOR] Add clustering to feature list

2021-02-12 Thread GitBox
yanghua commented on pull request #2568: URL: https://github.com/apache/hudi/pull/2568#issuecomment-778579563 @vinothchandar There is another PR that works for the `README` file of the project. This is an automated message f

[GitHub] [hudi] yanghua merged pull request #2574: [MINOR] Default to empty list for unset datadog tags property

2021-02-12 Thread GitBox
yanghua merged pull request #2574: URL: https://github.com/apache/hudi/pull/2574 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch master updated: [MINOR] Default to empty list for unset datadog tags property (#2574)

2021-02-12 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 527175a [MINOR] Default to empty list for unset

[GitHub] [hudi] codecov-io edited a comment on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-751367927 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2382?src=pr&el=h1) Report > Merging [#2382](https://codecov.io/gh/apache/hudi/pull/2382?src=pr&el=desc) (2d4fe47) in

[jira] [Commented] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-12 Thread Volodymyr Burenin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284028#comment-17284028 ] Volodymyr Burenin commented on HUDI-1602: - It appears that the issues is in parque

[GitHub] [hudi] codecov-io edited a comment on pull request #2574: [MINOR] Default to empty list for unset datadog tags property

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2574: URL: https://github.com/apache/hudi/pull/2574#issuecomment-778477581 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=h1) Report > Merging [#2574](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=desc) (9419cb5) in

[GitHub] [hudi] t0il3ts0ap commented on issue #2515: [HUDI-1615] [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-12 Thread GitBox
t0il3ts0ap commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-778491246 @vinothchandar I checked in most recent deltacommit file. `schema` key is present. ``` "extraMetadata" : { "schema" : "{\"type\":\"record\",\"name\":\"hoodie_sourc

[jira] [Updated] (HUDI-1615) GH Issue 2515/ Failure to archive commits on row writer/delete paths

2021-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1615: - Labels: pull-request-available sev:critical (was: sev:critical) > GH Issue 2515/ Failure to archi

[GitHub] [hudi] vinothchandar commented on issue #2515: [HUDI-1615] [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-12 Thread GitBox
vinothchandar commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-778483406 @t0il3ts0ap I fixed the row writing part locally. but wanted to check your situation as well, before I send a PR. For delta streamer/upsert (can you confirm that the operation is

[GitHub] [hudi] codecov-io edited a comment on pull request #2574: [MINOR] Default to empty list for unset datadog tags property

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2574: URL: https://github.com/apache/hudi/pull/2574#issuecomment-778477581 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=h1) Report > Merging [#2574](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=desc) (1a21aac) in

[GitHub] [hudi] codecov-io commented on pull request #2574: [MINOR] Default to empty list for unset datadog tags property

2021-02-12 Thread GitBox
codecov-io commented on pull request #2574: URL: https://github.com/apache/hudi/pull/2574#issuecomment-778477581 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=h1) Report > Merging [#2574](https://codecov.io/gh/apache/hudi/pull/2574?src=pr&el=desc) (b41372b) into [ma

[GitHub] [hudi] xushiyan opened a new pull request #2574: [MINOR] Default to empty list for unset datadog tags property

2021-02-12 Thread GitBox
xushiyan opened a new pull request #2574: URL: https://github.com/apache/hudi/pull/2574 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc

[GitHub] [hudi] nsivabalan commented on pull request #2530: [HUDI-1579] Trying out github auto labelling for issues

2021-02-12 Thread GitBox
nsivabalan commented on pull request #2530: URL: https://github.com/apache/hudi/pull/2530#issuecomment-778473518 sorry, will test it out locally and will update here. This is an automated message from the Apache Git Service.

[GitHub] [hudi] nsivabalan commented on pull request #2400: [HUDI-1594] Some fixes and enhancements to test suite framework

2021-02-12 Thread GitBox
nsivabalan commented on pull request #2400: URL: https://github.com/apache/hudi/pull/2400#issuecomment-778473048 @n3nash : https://issues.apache.org/jira/browse/HUDI-1616 @vinothchandar : sure. I will sync up with nishith on this and will take it up. --

[jira] [Created] (HUDI-1616) Abstract out one off operations within dag

2021-02-12 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1616: - Summary: Abstract out one off operations within dag Key: HUDI-1616 URL: https://issues.apache.org/jira/browse/HUDI-1616 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-1616) Abstract out one off operations within dag

2021-02-12 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1616: -- Fix Version/s: 0.8.0 > Abstract out one off operations within dag >

[GitHub] [hudi] kimberlyamandalu commented on issue #2123: Timestamp not parsed correctly on Athena

2021-02-12 Thread GitBox
kimberlyamandalu commented on issue #2123: URL: https://github.com/apache/hudi/issues/2123#issuecomment-778464849 For reference, this is their response for the ticket. They have suggested a workaround. Greetings Kim, Thank you for contacting AWS Premium Support. This is Mrunmaye

[GitHub] [hudi] codecov-io edited a comment on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-776932935 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=h1) Report > Merging [#2500](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=desc) (1d5fb46) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-776932935 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=h1) Report > Merging [#2500](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=desc) (1d5fb46) in

[GitHub] [hudi] vinothchandar commented on pull request #2400: [HUDI-1594] Some fixes and enhancements to test suite framework

2021-02-12 Thread GitBox
vinothchandar commented on pull request #2400: URL: https://github.com/apache/hudi/pull/2400#issuecomment-778344811 @n3nash @nsivabalan I would like for some of this to run on every commit. at least 1 test for each COW and MOR. Can one of you be able to add that to the CI? ---

[hudi] branch master updated: Adding fixes to test suite framework. Adding clustering node and validate async operations node. (#2400)

2021-02-12 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d5f2028 Adding fixes to test suite framework. A

[GitHub] [hudi] n3nash merged pull request #2400: [HUDI-1594] Some fixes and enhancements to test suite framework

2021-02-12 Thread GitBox
n3nash merged pull request #2400: URL: https://github.com/apache/hudi/pull/2400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-1594) Add support for clustering node and validating async operations

2021-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1594: - Labels: pull-request-available (was: ) > Add support for clustering node and validating async ope

[GitHub] [hudi] n3nash commented on a change in pull request #2400: [HUDI-1594] Some fixes and enhancements to test suite framework

2021-02-12 Thread GitBox
n3nash commented on a change in pull request #2400: URL: https://github.com/apache/hudi/pull/2400#discussion_r575384984 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ClusteringNode.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] nsivabalan commented on issue #2123: Timestamp not parsed correctly on Athena

2021-02-12 Thread GitBox
nsivabalan commented on issue #2123: URL: https://github.com/apache/hudi/issues/2123#issuecomment-778324814 Thanks @kimberlyamandalu. Will close this out. Please do reach out to us if you need any more assistance. thanks for helping to better Hudi community :) ---

[GitHub] [hudi] nsivabalan closed issue #2123: Timestamp not parsed correctly on Athena

2021-02-12 Thread GitBox
nsivabalan closed issue #2123: URL: https://github.com/apache/hudi/issues/2123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] mauropelucchi commented on issue #2564: Hoodie clean is not deleting old files

2021-02-12 Thread GitBox
mauropelucchi commented on issue #2564: URL: https://github.com/apache/hudi/issues/2564#issuecomment-778321055 @bvaradar We are running this conf for 2 separate locations: hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': 'key', '

[GitHub] [hudi] codecov-io edited a comment on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
codecov-io edited a comment on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-776932935 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=h1) Report > Merging [#2500](https://codecov.io/gh/apache/hudi/pull/2500?src=pr&el=desc) (463fc92) in

[GitHub] [hudi] nsivabalan commented on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
nsivabalan commented on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-778314249 @bvaradar : this patch might be of interest to you. This is an automated message from the Apache Git Service. T

[hudi] branch asf-site updated: [DOCS] Add clustering feature to the home page (#2569)

2021-02-12 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new d17c8fc [DOCS] Add clustering feature to the

[GitHub] [hudi] vinothchandar merged pull request #2569: [MINOR] Add clustering feature to the home page

2021-02-12 Thread GitBox
vinothchandar merged pull request #2569: URL: https://github.com/apache/hudi/pull/2569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vburenin commented on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
vburenin commented on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-778309343 > yeah, that's what I initially thought. but wasn't sure if we need to do two checks > > ``` > if(fsDataInputStream.getWrappedStream() instance of FSDataInputStream && (

[GitHub] [hudi] nsivabalan commented on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
nsivabalan commented on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-778307082 yeah, that's what I initially thought. but wasn't sure if we need to do two checks ``` if(fsDataInputStream.getWrappedStream() instance of FSDataInputStream && ((FSDataIn

[GitHub] [hudi] rswagatika removed a comment on issue #2564: Hoodie clean is not deleting old files

2021-02-12 Thread GitBox
rswagatika removed a comment on issue #2564: URL: https://github.com/apache/hudi/issues/2564#issuecomment-778300130 @bvaradar I ran the command clean show and cleans run from my hudi client This is an

[GitHub] [hudi] rswagatika commented on issue #2564: Hoodie clean is not deleting old files

2021-02-12 Thread GitBox
rswagatika commented on issue #2564: URL: https://github.com/apache/hudi/issues/2564#issuecomment-778300130 @bvaradar I ran the command clean show and cleans run from my hudi client This is an automat

[GitHub] [hudi] rswagatika closed issue #2564: Hoodie clean is not deleting old files

2021-02-12 Thread GitBox
rswagatika closed issue #2564: URL: https://github.com/apache/hudi/issues/2564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] vinothchandar commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-12 Thread GitBox
vinothchandar commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-778287412 For now, if you try the one line fix in `CommitUtils`, we will be out of the woods. I have raised a sev:critical JIRA here https://issues.apache.org/jira/browse/HUDI-1615 for the

[GitHub] [hudi] vinothchandar edited a comment on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-12 Thread GitBox
vinothchandar edited a comment on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-778287412 For now, if you try the one line fix in `CommitUtils`, we will be out of the woods. I have raised a sev:critical JIRA here https://issues.apache.org/jira/browse/HUDI-1615

[jira] [Created] (HUDI-1615) GH Issue 2515/ Failure to archive commits on row writer/delete paths

2021-02-12 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1615: Summary: GH Issue 2515/ Failure to archive commits on row writer/delete paths Key: HUDI-1615 URL: https://issues.apache.org/jira/browse/HUDI-1615 Project: Apache Hudi

[GitHub] [hudi] vburenin commented on a change in pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
vburenin commented on a change in pull request #2500: URL: https://github.com/apache/hudi/pull/2500#discussion_r575332449 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/SchemeAwareFSDataInputStream.java ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Soft

[GitHub] [hudi] vinothchandar commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-12 Thread GitBox
vinothchandar commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-778283971 @t0il3ts0ap I am onto this. So the issue seems to be related to how the schema is set So the issue seems to be that the the schema value seems to be null in `extraMetadata

[GitHub] [hudi] nsivabalan commented on a change in pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
nsivabalan commented on a change in pull request #2500: URL: https://github.com/apache/hudi/pull/2500#discussion_r575323321 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/SchemeAwareFSDataInputStream.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache So

[GitHub] [hudi] nsivabalan commented on a change in pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-02-12 Thread GitBox
nsivabalan commented on a change in pull request #2500: URL: https://github.com/apache/hudi/pull/2500#discussion_r575323321 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/SchemeAwareFSDataInputStream.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache So

[GitHub] [hudi] nsivabalan commented on pull request #2562: [HUDI-1540] Fixing commons codec depedency in bundle jars

2021-02-12 Thread GitBox
nsivabalan commented on pull request #2562: URL: https://github.com/apache/hudi/pull/2562#issuecomment-778267262 mvn dependency:tree -Dincludes=commons-codec:commons-codec ``` . . [INFO] ---< org.apache.hudi:hudi-spark-bundle_2.11 >--- [INFO] Build

[GitHub] [hudi] manijndl7 edited a comment on pull request #2320: [HUDI-57] Added Orc Writer to Support Orc in Hudi

2021-02-12 Thread GitBox
manijndl7 edited a comment on pull request #2320: URL: https://github.com/apache/hudi/pull/2320#issuecomment-778192878 @vinothchandar can i close this PR ? since other people will work on this. This is an automated message fr

[GitHub] [hudi] manijndl7 commented on pull request #2320: [HUDI-57] Added Orc Writer to Support Orc in Hudi

2021-02-12 Thread GitBox
manijndl7 commented on pull request #2320: URL: https://github.com/apache/hudi/pull/2320#issuecomment-778192878 @vinothchandar can i close this PR ? since other people will work on this ? This is an automated message from the

[GitHub] [hudi] shenh062326 commented on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-02-12 Thread GitBox
shenh062326 commented on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-778149262 > @shenh062326 Thanks for your contribution, the PR looks good to me. I think we would do some refactor between different engine implementation since most of the code would be

[jira] [Created] (HUDI-1614) Do some refactor between different engine implementation

2021-02-12 Thread shenh062326 (Jira)
shenh062326 created HUDI-1614: - Summary: Do some refactor between different engine implementation Key: HUDI-1614 URL: https://issues.apache.org/jira/browse/HUDI-1614 Project: Apache Hudi Issue Ty

[GitHub] [hudi] shenh062326 commented on a change in pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-02-12 Thread GitBox
shenh062326 commented on a change in pull request #2382: URL: https://github.com/apache/hudi/pull/2382#discussion_r575164713 ## File path: hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/restore/JavaCopyOnWriteRestoreActionExecutor.java ## @@ -0,0 +1,66

[GitHub] [hudi] bvaradar commented on issue #2573: Rebuild a HUDI table using the Snapshot of HUDI table with its commit timeline metadata

2021-02-12 Thread GitBox
bvaradar commented on issue #2573: URL: https://github.com/apache/hudi/issues/2573#issuecomment-778041438 You can simply copy the whole folder at the dataset level to the new location and it should work. All metadata is relative to the dataset path and it should be ok to copy them. -