Re: CDC and MOR tables

2024-09-15 Thread Danny Chan
yeah, the InstantRange would include exact the commits that need to be read. Best, Danny Jack Vanlightly 于2024年9月5日周四 01:27写道: > > Ok, so the InstantRange would be an exact match one, with only the instants > of compactions? > > On Mon, Aug 26, 2024, 02:46 Danny Chan wrote: &

Re: CDC and MOR tables

2024-08-25 Thread Danny Chan
m CDC > change log. This looks like a bug to me. > > Thanks > Jack > > > Thanks > Jack > > On Tue, Aug 20, 2024 at 6:29 AM Danny Chan wrote: > > > yeah, you are right, for mor table, when the cdc log is enabled, which > > are generated during compaction,

Re: CDC and MOR tables

2024-08-19 Thread Danny Chan
yeah, you are right, for mor table, when the cdc log is enabled, which are generated during compaction, there are two choices for the reader: 1. read the changes from the change log, which got a hight TTL delay because these logs are only generated during compaction; 2. or it can infers the change

Re: [VOTE] Release 1.0.0-beta2, release candidate #2

2024-07-12 Thread Danny Chan
+1 (binding) - Flink NB-CC local e2e test - Run a Flink lookup join use case - Run a MOR streaming ingestion pipeline with async compaction enabled Best, Danny Vinoth Chandar 于2024年7月12日周五 11:58写道: > > +1 (binding) > > Please call out any limitations around updates and secondary indexes in the

Re: [VOTE] Release 0.15.0, release candidate #3

2024-06-03 Thread Danny Chan
+1 (binding) - run flink quick start [OK] - run a mor ingestion pipeline locally with async compaction [OK] - run a streaming read pipeline for mor table [OK] Best, Danny Vinoth Chandar 于2024年6月4日周二 04:55写道: > > +1 (binding) > > On Mon, Jun 3, 2024 at 12:22 Shiyan Xu wrote: > > > +1 (binding)

Re: Getting the error (Could not find any factories that implement 'org.apache.flink.table.delegation.ExecutorFactory' in the classpath.)

2024-01-24 Thread Danny Chan
It looks like a Flink data stream error, did you reference the Flink website doc or did you reach out to the Flink guys? Best, Danny Varagini Karthik 于2023年11月22日周三 21:58写道: > > Hi All, > > I started newly with Hudi. Playing around to get a better understanding. > I am trying with the flink data

Re: [VOTE] Release 0.14.1, release candidate #1

2023-12-25 Thread Danny Chan
+1 (binding) - ran some flink e2e use cases - check the necessary commits included - run Flink SQL quick start - check the bundle jar Best, Danny Nicolas Paris 于2023年12月26日周二 01:53写道: > > -1 (non binding) > > ran our internal test suite on 0.14.1-rc1 and found 2 issues on hudi > third parties:

Re: [External] Current state of parquet zstd OOM with hudi

2023-12-03 Thread Danny Chan
> I would say an entry in the hudi FAQ on this issue would be great, since hard > to spot, and marked as fixed on spark side. Makes sense, welcome to fire a fix to Hudi website. Best, Danny Nicolas Paris 于2023年11月22日周三 15:55写道: > > We fixed the hudi memory leak by patching parquet 1.12 and rel

Re: [Discussion] Support EventTimeBasedCompactionStrategy based on merging some log files

2023-12-03 Thread Danny Chan
The general direction looks good, for functionality that only compact partial log files, does the existing log compaction match your needs? https://github.com/apache/hudi/blob/master/rfc/rfc-48/rfc-48.md Best, Danny 孔维 <18701146...@163.com> 于2023年11月27日周一 23:43写道: > Background: > > 1. The data

Re: Tuning guide question about off-heap

2023-12-03 Thread Danny Chan
I'm not a parquet expert but I can confirm Hudi does not maintain specific memory strategy for parquet writers. Best, Danny nicolas paris 于2023年11月20日周一 17:54写道: > > hi everyone, > > from the tuning guide: > > > Off-heap memory : Hudi writes parquet files and that needs good > amount of off-heap

Re: Calling for 0.12.4 release

2023-09-21 Thread Danny Chan
Thanks Yue Zhang for the contribution ~ Best, Danny Y Ethan Guo 于2023年9月2日周六 00:24写道: > > Thanks, Yue Zhang, for volunteering to be the RM! > > On Thu, Aug 31, 2023 at 4:38 PM Yue Zhang wrote: > > > Hi Hudiers, > > I volunteer to be the RM for the next 0.12.4 if u don’t mind > > YueZhang >

Re: [VOTE] Release 0.14.0, release candidate #1

2023-08-24 Thread Danny Chan
-1 for some critical fixes: I saw some critical fixes on the master: 1. https://github.com/apache/hudi/pull/9483 2. https://github.com/apache/hudi/pull/9499 (very critical) 3. https://github.com/apache/hudi/pull/9467 4. https://github.com/apache/hudi/pull/9511 The 2 is very critical, it fixes th

Re: [VOTE] Release 0.12.3, release candidate #2

2023-04-17 Thread Danny Chan
+1(binding) 1. build the code in tag release-0.12.3-rc2 manually and it passed. 2. local e2e test for flink: append mode with/without clustering, upsert with compaction, all work good 3. check the KEYS Best, Danny Chan Sivabalan 于2023年4月14日周五 07:22写道: > > > Hi everyone, > > Pl

Re: [VOTE] Release 0.12.3, release candidate #1

2023-04-05 Thread Danny Chan
-1(binding) for picking up https://github.com/apache/hudi/pull/8374 Best, Danny Sivabalan 于2023年4月1日周六 01:11写道: > > Hi everyone, > > Please review and vote on the release candidate #1 for the version 0.12.3, > as follows: > > [ ] +1, Approve the release > [ ] -1, Do not approve the release (plea

[BIG CHANGE] Switch logger from log4j2 to slf4j

2023-02-22 Thread Danny Chan
Many popular Apache projects use slf4j now to avoid unnecessary conflicts, like the Apache Spark, Apache Flink,etc. slf4j is a bridge jar/interface for log4j/log4j2 to avoid conflicts, log4j2 is also a easy-conflicting jar even though it has more stable API than log4j As a bridge jar, slf4j relies

Re: [VOTE] Release 0.13.0, release candidate #3

2023-02-16 Thread Danny Chan
+1 (binding) - Ran e2e local tests for Flink read/write/compaction, especially for spilling - Ran some tests for CDC feature - Go through the Flink quick start with Flink 1.16 bundle jar Best, Danny Shiyan Xu 于2023年2月17日周五 01:31写道: > > +1 (binding) > > - Ran some quickstart spark read/write exa

Re: [VOTE] Release 0.13.0, release candidate #2

2023-02-07 Thread Danny Chan
+1 (binding) 1. check the KEYS file to include the signature 2. download the source code and validate the Flink write/read functionalities, especially for writer, force spilling for the compaction 3. check the tag release-0.13.0-rc2 already included the critical fixes of flink we mentioned before

Re: [VOTE] Release 0.12.2, release candidate #1

2022-12-20 Thread Danny Chan
Hi, there are another 2 fix that i want to include: https://github.com/apache/hudi/commit/c288a506d4c0b7c1272538d95928df118e4d79ac https://github.com/apache/hudi/commit/211af1a4fd76ce84ce80f4d1b2befe5fc9954888 Best, Danny Satish Kotha 于2022年12月20日周二 11:50写道: > > small correction in the first lin

Re: [DISCUSS] Build tool upgrade

2022-10-16 Thread Danny Chan
pache Calcite. Julian Hyde which is the creator of Calcite may have more words to say here. So I would not suggest we do that for Hudi. Best, Danny Chan Shiyan Xu 于2022年10月1日周六 13:48写道: > > Hi all, > > I'd like to raise a discussion around the build tool for Hudi. > > Mave

Re: [VOTE] Release 0.12.1, release candidate #2

2022-10-13 Thread Danny Chan
+1 (binding) Flink quickstart OK Long-running Flink SQL Job OK Flink Hive Sync OK Flink compaction and cleaning OK Compile the source code OK Regards, Danny Rahil C 于2022年10月14日周五 02:46写道: > > +1 (non-binding) > > Ran hudi-spark bundle against EMR integration tests > > > > On Thu, Oct 13, 2022

Re: [VOTE] Release 0.12.0, release candidate #1

2022-08-01 Thread Danny Chan
broken for Spark >= 3.1 > <https://issues.apache.org/jira/projects/HUDI/issues/HUDI-4496?filter=allopenissues>, > and we'd really like to make sure this makes it into 0.12. > > -1, from my end. > > On Sun, Jul 31, 2022 at 11:51 PM Danny Chan wrote: > > > Hi,

Re: [VOTE] Release 0.12.0, release candidate #1

2022-07-31 Thread Danny Chan
Hi, sorry for bothering, but i would like to include the HUDI-4504, HUDI-4505, which are critical fix for Flink side. so -1 from my side. Best, Danny sagar sumit 于2022年7月30日周六 18:16写道: > > Hi everyone, > > Please review and vote on the release candidate #1 for the version 0.12.0, > as follows:

Re: 0.12.0 Release Timeline

2022-07-21 Thread Danny Chan
Have a quick review for the remaining release blockers and +1 from my side. Best, Danny Vinoth Chandar 于2022年7月15日周五 13:29写道: > > +1 from me. > > On Thu, Jul 14, 2022 at 9:43 AM sagar sumit wrote: > > > Hi Folks, > > > > After some deliberation with the community and keeping the release blocker

Re: [VOTE] Release 0.11.1, release candidate #2

2022-06-14 Thread Danny Chan
Thanks Ethan, would appreciate it if https://issues.apache.org/jira/browse/HUDI-4255 can be involved, the bug may cause the flink bucket index throws FileNotFoundException in some cases. Best, Danny Y Ethan Guo 于2022年6月13日周一 07:17写道: > > Hi everyone, > > Please review and vote on the release c

Re: HoodieTable removes data file right before the end of Flink job

2022-06-14 Thread Danny Chan
Thanks for the awesome analysis, you are right, after patch [2] the endinput event and metadata event may lost the execution sequence, which caused the problem here. Feel free to fire a JIRA ticket to fix it :) Best, Danny Александр Трушев 于2022年6月14日周二 17:43写道: > > Hello everyone, I found a str

Re: Updates on 0.11.1 release

2022-06-09 Thread Danny Chan
Sorry, Ethan, i just got a critical fix here : https://github.com/apache/hudi/pull/5815, hope we can get it involved. The background is now Spark SQL uses different key gen strategy compared with data source v2 writers and with flink, which brings in many confusion/error feedback from the issues,

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Danny Chan
I have different concerns here, the Lake Manager seems like a single node service here, and there is a risk that it becomes a bottleneck for handling too many table services. And for every single node service we should consider how to achieve high availability. What is the final state of the Hudi

Re: [DISCUSS] hudi index improve

2022-04-18 Thread Danny Chan
In general, it seems that the INDEX commands mainly serve the batch scenarios, there are some cases that need to clarify here: 1. When a user creates an index with manuaral refresh first then inserts a batch of data(named d1) into the table, does the index created take effect on d1 ? 2. If a user

[ANNOUNCE] New Apache Hudi Committer - Zhaojing Yu

2022-03-24 Thread Danny Chan
Hi everyone, On behalf of the PMC, I'm very happy to announce Zhaojing Yu as a new Hudi committer. Zhaojing is very active in Flink Hudi contributions, many cool features such as the flink streaming bootstrap, compaction service and all kinds of writing modes are contributed by him. He also fixed

Re: [DISCUSS] New RFC? Add Call Procedure Command for spark sql

2022-01-10 Thread Danny Chan
+1 for starting a new RFC. Danny Vinoth Chandar 于2022年1月11日 周二下午2:15写道: > +1 please start a RFC > > On Fri, Jan 7, 2022 at 5:50 AM Forward Xu wrote: > > > Hi All, > > > > I want to add Call Procedure Command to spark sql, which will be very > > useful to meet DDL and DML functions that cannot b

Re: Regular minor/patch releases

2021-12-30 Thread Danny Chan
0 branch if needed. That would avoid > > > cherry-picking > > > > all bug fixes from master to release-0.10 at one time and cause so many > > > > conflicts. You would see the Spark[1] and Flink[2] community also > > > > maintaining a multi-master branch

Re: Preparation for 0.10.1 minor release

2021-12-24 Thread Danny Chan
I have tagged the issues that i think should be included in 0.10.1, plus this minot fix: https://github.com/apache/hudi/pull/4287/commits/45769dd17905240d5b513d304e5f9e86fe094642 Sivabalan 于2021年12月21日周二 11:42写道: > > sure, that makes sense. > > On Mon, Dec 20, 2021 at 11:57 AM Vinoth Chandar wro

Re: Regular minor/patch releases

2021-12-14 Thread Danny Chan
I guess we must do that for current rapid development and iteration. As for the release 0.10.0, after the announcement of only a few days we have received a bunch of bugs reported by the github issues: such as - the empty meta file: https://github.com/apache/hudi/issues/4249 - and the timeline bas

[ANNOUNCE] Apache Hudi 0.10.0 released

2021-12-13 Thread Danny Chan
e project website at: http://hudi.apache.org/ Thanks to everyone involved! Danny Chan

[RESULT] [VOTE] Release 0.10.0, release candidate #3

2021-12-07 Thread Danny Chan
Hi everyone, I'm happy to announce that we have approved this release. There are 11 approving votes and 0 unapproving votes, of which all the 9 approving votes are binding. Here is the breakdown: +1 (binding) : 9 * Vinoth Chandar * Bhavani Sudha * vino yang * Balaji Varadaraj

[VOTE] Release 0.10.0, release candidate #3

2021-12-04 Thread Danny Chan
Hi everyone, Please review and vote on the release candidate #3 for the version 0.10.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1]

Re: [VOTE] Release 0.10.0, release candidate #2

2021-11-29 Thread Danny Chan
The vote has been canceled, so I will prepare RC#3 soon. Best, Danny Danny Chan 于2021年11月29日周一 下午4:31写道: > Hi everyone, > > Please review and vote on the release candidate #2 for the version 0.10.0, > as follows: > > [ ] +1, Approve the release > > [ ] -1, Do not app

[VOTE] Release 0.10.0, release candidate #2

2021-11-29 Thread Danny Chan
Hi everyone, Please review and vote on the release candidate #2 for the version 0.10.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1]

Re: [VOTE] Release 0.10.0, release candidate #1

2021-11-28 Thread Danny Chan
the release jar. Will wait to > hear from other experts. > > > > On Sat, Nov 27, 2021 at 9:56 AM Manoj Govindassamy < > manoj.govindass...@gmail.com> wrote: > > > +1 > > > > On Sat, Nov 27, 2021 at 4:49 AM Danny Chan wrote: > > > > >

[VOTE] Release 0.10.0, release candidate #1

2021-11-27 Thread Danny Chan
Hi everyone, Please review and vote on the release candidate #1 for the version 0.10.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1]

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-27 Thread Danny Chan
> > >- [HUDI-2672] Avoid empty commits and rollbacks when there is > >> no > >> > > event > >> > > > >from the topic (Owner: Rajesh Mahindra) > >> > > > > > >> > > > > ** Pending > >> > > >

[DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Danny Chan
Hi Community, As we draw close to doing Hudi 0.10.0 release, I am happy to share a summary of the key features/improvements that would be going in the release and the current blockers for everyone's visibility. *Highlights* - [HUDI-1290] Implement Debezium avro source for Delta Streamer -

Re: Release 0.10.0 planning

2021-11-01 Thread Danny Chan
I can take that. Best, Danny Vinoth Chandar 于2021年10月30日周六 上午6:07写道: > Hi all, > > I propose we cut the RC for 0.10.0 by Nov 19. > > Any volunteers for release manager? > > Thanks > Vinoth > > On Sun, Oct 17, 2021 at 10:45 AM Sivabalan wrote: > > > This release has a lot of exciting features l

Re: How to do apache hudi performance test?

2021-09-23 Thread Danny Chan
+1, a benchmark that can reproduce is important for user testing then choose their final product. Best, Danny Chan casel.chen 于2021年9月14日周二 下午9:38写道: > Hello, everyone! > > > I want to know how to do apache hudi performance test like > https://hudi.apache.org/docs/performance/?

Re: [VOTE] Release 0.9.0, release candidate #2

2021-08-21 Thread Danny Chan
process. > And if anyone has any suggestions on improving the release process in > general (if we can seal the patches that go into a release upfront, etc), I > am all ears to that as well. > > > On Sat, Aug 21, 2021 at 10:41 PM Danny Chan wrote: > > > I have fired a cher

Re: [VOTE] Release 0.9.0, release candidate #2

2021-08-21 Thread Danny Chan
I have fired a cherry-pick PR: https://github.com/apache/hudi/pull/3519 Best, Danny Danny Chan 于2021年8月22日周日 上午9:07写道: > I'm sorry I would also vote -1. > > HUDI-2316 > HUDI-2340 > HUDI-2342 > > are all important improvements for Flink and we hope they can be >

Re: [VOTE] Release 0.9.0, release candidate #2

2021-08-21 Thread Danny Chan
I'm sorry I would also vote -1. HUDI-2316 HUDI-2340 HUDI-2342 are all important improvements for Flink and we hope they can be cherry picked to release 0.9. Best, Danny Udit Mehrotra 于2021年8月21日周六 上午7:13写道: > Hi everyone, > > Please review and vote on the release candidate #2 for the version

Re: Long test run times

2021-07-29 Thread Danny Chan
What should we do for these long running tests ? Simplify them to more simple UTs ? Vinoth Chandar 于2021年7月30日周五 上午6:53写道: > I am looking into > > 614.322 org.apache.hudi.client.TestHoodieClientOnCopyOnWriteStorage > 556.392 org.apache.hudi.metadata.TestHoodieBackedMetadata > > On Thu, Jul 29, 2

[DISCUSS] Disable ASF GitHub Bot comments under the JIRA issue

2021-07-26 Thread Danny Chan
I found that there are many ASF GitHub Bot comments under our issue now, it messes up with the design discussions and is hard to read. The normal comments are drowned in these junk messages. So i request to disable it to make the JIRA comments clear and clean. Best, Danny Chan

Re: How to disable the ASF GitHub Bot comments under the issue ticket ?

2021-07-26 Thread Danny Chan
Thanks, i would fire a discussion ~ Best, Danny Chan Vinoth Chandar 于2021年7月26日周一 下午11:57写道: > Hi Danny, > > Worth discussing. It was turned on by adding "comment" here. > > https://github.com/apache/hudi/blob/master/.asf.yaml#L41 > > The intention is that all GH

How to disable the ASF GitHub Bot comments under the issue ticket ?

2021-07-26 Thread Danny Chan
I found that there are many ASF GitHub Bot comments under our issue now, it messed up with the design discussions and hard to read. Is there are way to disable it ? Best, Danny

Re: [VOTE] Move content off cWiki

2021-07-19 Thread Danny Chan
+1 - Approve the move Best, Danny Raymond Xu 于2021年7月20日 周二上午8:25写道: > +1 - Approve the move > > On Mon, Jul 19, 2021 at 5:22 PM Bhavani Sudha > wrote: > > > +1 - Approve the move > > > > On Mon, Jul 19, 2021 at 5:16 PM vbal...@apache.org > > wrote: > > > > > > > > +1 - Approve the move > > >

Re: what's different between Append only and insert in Flink stream?

2021-07-11 Thread Danny Chan
Append only means the merge never happens. Best, Danny Chan Jian Feng 于2021年7月10日 周六下午10:41写道: > I saw a pr here https://github.com/apache/hudi/pull/3252 > -- > > FengJian > > Data Infrastructure Team > > Mobile +65 90388153 > > Address 5 Science Park Drive > &l

Re: Website redesign

2021-06-30 Thread Danny Chan
All the pages assigns to volunteers or there is a someone major in it. Best, Danny Chan Vinoth Chandar 于2021年7月1日 周四上午6:00写道: > Any volunteers? Also worth asking in slack? > > On Sat, Jun 26, 2021 at 5:03 PM Raymond Xu > wrote: > > > Hi all, > > > > We&#

Re: [HELP] unstable tests in the travis CI

2021-06-22 Thread Danny Chan
That's cool, thanks ~ Best, Danny Chan pzwpzw 于2021年6月23日周三 下午1:58写道: > Hi Danny, There is a bug in schema resolve in DefaultSource which lead > the test case 2 crash. I will submit a PR to solve this later. > > 2021年6月23日 下午1:49,Danny Chan 写道: > > Hi, fellows, ther

[HELP] unstable tests in the travis CI

2021-06-22 Thread Danny Chan
evolution for ... [2] [1] https://travis-ci.com/github/apache/hudi/jobs/518067391 [2] https://travis-ci.com/github/apache/hudi/jobs/518067393 Best, Danny Chan

Re: [Discuss] Provide a Flag to choose between Flink or Spark

2021-06-16 Thread Danny Chan
There was actually an issue here: https://issues.apache.org/jira/browse/HUDI-1872, maybe you can take it and go on with the work ~ Best, Danny Chan Vinay Patil 于2021年6月11日周五 下午3:26写道: > Thank you Danny for your response. > > Can we have a JIRA story where all the refactoring is req

Re: [Discuss] Provide a Flag to choose between Flink or Spark

2021-06-11 Thread Danny Chan
same problem. Best, Danny Chan Vinay Patil 于2021年6月9日周三 下午3:42写道: > Hi Team, > > Currently, Hudi supports Flink as well Spark, there are two different > classes > 1. HoodieDeltaStreamer > 2. FlinkHoodieDeltaStreamer > > Should we have a provision to pass the flag like --

Re: [DISCUSS] Hash Index for HUDI

2021-06-06 Thread Danny Chan
be too complicated, we should avoid that. It also requires that the query engine be aware of the bucketing rules, not that transparent and is not a common query optimization. Best, Danny Chan 耿筱喻 于2021年6月4日周五 下午6:06写道: > Thank you for your questions. > > For the first question, the number of buck

Re: [DISCUSS] Hash Index for HUDI

2021-06-02 Thread Danny Chan
solution to solve this problem now ? Best, Danny Chan 耿筱喻 于2021年6月2日周三 下午10:42写道: > Hi, > Currently, Hudi index implementation is pluggable and provides two > options: bloom filter and hbase. When a Hudi table becomes large, the > performance of bloom filter degrade drastically due to the

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-22 Thread Danny Chan
-01-01T00:00:06, par3 with streaming query "select name, sum(age) from t1 group by name" returns: change_flag | name | age_sum I, Danny, 24 I Stephen, 34 The result is the same as a batch snapshot query. Best, Danny Chan Vinoth Chandar 于2021年4月21日周三 下午1:32写道: > Keeping compatibilit

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-20 Thread Danny Chan
I think there is no need to add a config option which brings in unnecessary overhead. If we do not ensure backward compatibility for new column, then we should add such a config option and by default disable it. Best, Danny Chan Vinoth Chandar 于2021年4月21日周三 上午6:30写道: > Hi Danny, > > Rea

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-20 Thread Danny Chan
need some unit tests for the new column in the hoodie core, but i don't know how to, could you give some help ? Best, Danny Chan Danny Chan 于2021年4月19日周一 下午4:42写道: > Thanks @Sivabalan ~ > > I agree that parquet and log files should keep sync in metadata columns in > case ther

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-19 Thread Danny Chan
"_hoodie_change_flag" and a config option to default disable this metadata column, what do you think? Best, Danny Chan Sivabalan 于2021年4月17日周六 上午9:10写道: > wrt changes if we plan to add this only to log files, compaction needs to > be fixed to omit this column to the minimum. > >

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-15 Thread Danny Chan
ink-docs-stable/dev/table/streaming/dynamic_tables.html Best, Danny Chan Vinoth Chandar 于2021年4月16日周五 上午9:44写道: > Hi, > > Is the intent of the flag to convey if an insert delete or update changed > the record? If so I would imagine that we do this even for cow tables, > sin

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Danny Chan
+1 for the vision, personally i'm promising the incremental ETL part, with engine like Apache Flink we can do intermediate aggregation in streaming style. Best, Danny Chan leesf 于2021年4月14日周三 上午9:52写道: > +1. Cool and promising. > > Mehrotra, Udit 于2021年4月14日周三 上午2:57写道: > &

Re: Apache Hudi 0.8.0 Released

2021-04-09 Thread Danny Chan
Cheers ~ Best, Danny Chan Vinoth Chandar 于2021年4月10日周六 上午12:43写道: > Thanks Gary! +1 fantastic job with the release! > > Please also announce on Slack (if not done already) > > I shared some tweets at https://twitter.com/apachehudi > > On Fri, Apr 9, 2021 at 7:44 AM leesf

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-08 Thread Danny Chan
i add the "_hoodie_change_flag" metadata column, or is there any better solution for this? Best, Danny Chan Danny Chan 于2021年4月2日周五 上午11:08写道: > Thanks cool, then the left questions are: > > - where we record these change, should we add a builtin meta field such as > the _change_fla

Re: Discussion for timestamp support

2021-04-01 Thread Danny Chan
I think the read is very about each engine because Hoodie does not define its own parquet reader yet, for e.g the Flink reader can read int96 as timestamp based on the declared precision. Best, Danny Chan lrz <369091...@qq.com> 于2021年4月1日周四 下午12:04写道: > Hi, I want to discuss about th

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-01 Thread Danny Chan
"MERGE_ON_READ" table, and only for AVRO logs - we should add a config there to switch on/off the flags in system meta fields What do you think? Best, Danny Chan vino yang 于2021年4月1日周四 上午10:58写道: > >> Oops, the image crushes, for "change flags", i mean: insert, > updat

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-03-31 Thread Danny Chan
al (almost transparent to users). Best, Danny Chan vino yang 于2021年3月31日周三 下午11:32写道: > Hi Danny, > > Thanks for kicking off this discussion thread. > > Yes, incremental query( or says "incremental processing") has always been > an important feature of the Hudi framework. If

[DISCUSS] Incremental computation pipeline for HUDI

2021-03-31 Thread Danny Chan
ate these change flags to its consumers, we can use HUDI as the unified format for the pipeline. I'm expecting your nice ideas here ~ Best, Danny Chan

Re: Request to join Project Committer Group

2021-03-31 Thread Danny Chan
cc @vinoth Best, Danny Chan harshit mittal 于2021年3月31日周三 下午3:18写道: > Hi, > I'd like to be added to the project committer group. Could somebody help me > with this request?(jiraId: hmittal83, cwiki userId: hmittal83). > -- > Best, > Harshit >

Re: [0.8.0 RELEASE] Codebase freeze for 0.8.0 release

2021-03-22 Thread Danny Chan
Hi, everyone ~ I have post a PR today https://github.com/apache/hudi/pull/2702, and i want it to be in 0.8.0, was waiting for the CI to pass but the testing is in queue. Best, Danny Chan nishith agarwal 于2021年3月22日周一 下午4:25写道: > Gary/Siva, > > All the release blockers have been la

Re: [0.8.0 RELEASE] Codebase freeze for 0.8.0 release

2021-03-16 Thread Danny Chan
we don't hold up the release due to large site redesign. > > Thanks > Vinoth > > On Mon, Mar 15, 2021 at 8:03 PM Danny Chan wrote: > > > The 20th seems more reasonable, i'm preparing for the Flink HUDI document > > (including the quick start) now, i got the impres

Re: [DISCUSSION] Improve Hudi release process

2021-03-15 Thread Danny Chan
Reasonable, how much minor releases do we plan to maintain for a major release ? 7 or 8 ? Gary Li 于2021年3月13日周六 下午8:58写道: > Hi everyone, > > Let's discuss how to improve the release process of Hudi in this thread. > The goal is to release Hudi more frequently and hardening the reliability > of t

Re: [0.8.0 RELEASE] Codebase freeze for 0.8.0 release

2021-03-15 Thread Danny Chan
The 20th seems more reasonable, i'm preparing for the Flink HUDI document (including the quick start) now, i got the impression that the community want to reorganize the web using tabs to separate the Spark, Hive, Flink docs, should i do that in my Flink doc PR ? Best, Danny Chan Gary Li 于

Re: 0.8.0 Release discussion

2021-03-03 Thread Danny Chan
> > > Vinoth Chandar 于2021年3月2日周二 上午10:30写道: > > > > > > > +1 > > > > > > > > There are two more PRs to land for multi writers, and some bug fixes > > > around > > > > the metadata table. > > > > > > > >

Re: [DISCUSS] Support multiple ordering fields

2021-03-03 Thread Danny Chan
which the PRIMARY KEY definition can not cover. Best, Danny Chan Raymond Xu 于2021年2月5日周五 下午6:48写道: > No worries Vinoth. Thank you for the feedback. I have created > https://issues.apache.org/jira/browse/HUDI-1588 > Anyone interested please feel free to pick it up. I would be happy to do

Re: 0.8.0 Release discussion

2021-03-01 Thread Danny Chan
Thanks Gary Li for firing this discussion ~ +1 for the date to be in the middle of March, before that, i would make some local integration test and performance test. Best, Danny Gary Li 于2021年3月1日周一 下午12:56写道: > Hi All, > > I’d like to start a discussion about the 0.8.0 release planning. Recen

Re: [DISCUSS] Rethink the abstraction of current client

2021-01-19 Thread Danny Chan
> It contains three components: - Two objects: a table, a batch of records; For the Spark client, it is true because no matter Spark or Spark streaming engine, they write as batches, but things are different for pure streaming engines like Flink, Flink writes per-record, it does not accumulate

Re: [DISCUSS] Support multiple ordering fields

2021-01-19 Thread Danny Chan
> Wondering if we should just take a bunch of payload configs and deprecate these flags I have the same feeling, there are already so many config options in Hoodie, the maintain work for developers or users is hard. Vinoth Chandar 于2021年1月18日周一 下午11:40写道: > +1 as well. > > Slightly orthogonal p

Re: [DISCUSS] New Flink Writer Proposal

2021-01-10 Thread Danny Chan
the current flink writer is like an app, just like the delta > > streamer. If we want to build another Flink writer, we can still > share the > > same flink client right? Does the flink client also have to use the > new > > feature only available on Flink 1.12? &

Re: Re: [DISCUSS] New Flink Writer Proposal

2021-01-08 Thread Danny Chan
Hi, i have updated the CWIKI as a new RFC there [1], lets move the discussion there ~ [1] https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal Best, Danny Chan Danny Chan 于2021年1月8日周五 上午10:34写道: > > We can maintain one or two, although, we both try t

Re: Re: [DISCUSS] New Flink Writer Proposal

2021-01-07 Thread Danny Chan
share > the > > same flink client right? Does the flink client also have to use the new > > feature only available on Flink 1.12? > > > > Thanks, > > Gary Li > > > > From: Danny Chan > > Sent: Thursday, January 7, 20

Re: Re: [DISCUSS] New Flink Writer Proposal

2021-01-06 Thread Danny Chan
; Hi Danny, > > > > You should have cwiki edit permission now. > > Any problems let me know. > > > > Best, > > Vino > > > > Danny Chan 于2021年1月6日周三 下午12:05写道: > > > >> Sorry ~ > >> > >> Forget to say that my Confluence

Re: Re: [DISCUSS] New Flink Writer Proposal

2021-01-05 Thread Danny Chan
Sorry ~ Forget to say that my Confluence ID is danny0405. It would be nice if any of you can help on this. Best, Danny Chan Danny Chan 于2021年1月6日周三 下午12:00写道: > Hi, can someone give me the CWIKI permission so that i can update the > design details to that (maybe as a new RFC

Re: Re: [DISCUSS] New Flink Writer Proposal

2021-01-05 Thread Danny Chan
>> [1]: https://issues.apache.org/jira/browse/FLINK-15099 > >> > >> Best, > >> Vino > >> > >> Gary Li 于2021年1月5日周二 上午10:40写道: > >> > >>> Hi Danny, > >>> > >>> Thanks for the proposal. I'd recommend s

Re: [DISCUSS] New Flink Writer Proposal

2021-01-04 Thread Danny Chan
Sure, i can update the RFC-13 cwiki if you agree with that. Vinoth Chandar 于2021年1月5日周二 上午2:58写道: > Overall +1 on the idea. > > Danny, could we move this to the apache cwiki if you don't mind? > That's what we have been using for other RFC discussions. > > On Mon,

[DISCUSS] New Flink Writer Proposal

2021-01-04 Thread Danny Chan
The RFC-13 Flink writer has some bottlenecks that make it hard to adapter to production: - The InstantGeneratorOperator is parallelism 1, which is a limit for high-throughput consumption; because all the split inputs drain to a single thread, the network IO would gains pressure too - The WriteProc

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread Danny Chan
t; In addition ,will it also support write SQL? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-12-19 02:10:16,"Nishi

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-17 Thread Danny Chan
Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would add support for SQL connectors of Hoodie Flink soon ~ Currently, i'm preparing a refactoring to the current Flink writer code. Vinoth Chandar 于2020年12月18日周五 上午6:39写道: > Thanks Kabeer for the note on gmail. Did not realiz

Re: 0.7.0 Release planning

2020-12-16 Thread Danny Chan
If no other release managers, i can be the one, although i'm only a contributor now ~ Vinoth Chandar 于2020年12月16日周三 下午12:36写道: > Hello all, > > We are hoping to cut a release candidate by Dec 31. Any volunteers for > being the Release Manager? > > Thanks > Vinoth >

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-16 Thread Danny Chan
Apache Calcite is a good candidate for parsing and executing the SQL, Apache Flink has an extension for the SQL based on the Calcite parser [1], > users will write : hudiSparkSession.sql("UPDATE ") Should user still need to instatiate the hudiSparkSession first ? My desired use case is user u

Re: Re: Congrats to our newest committers!

2020-12-07 Thread Danny Chan
Congratulations Satish and Prashant! Mani Jindal 于2020年12月5日周六 上午11:56写道: > Congratulations satish and Parshant > > On Sat, 5 Dec 2020 at 9:19 AM, leesf wrote: > > > Big congrats, Satish and Prashant! > > > > Raymond Xu 于2020年12月5日周六 上午3:58写道: > > > > > Big congrats, Satish and Prashant! Very