Re: [DISCUSS] Rethink the abstraction of current client

2021-02-02 Thread Vinoth Chandar
Sorry for the late reply. Standard excuse: 0.7.0 release. +1 on the need to rethink this. Some comments on issues in this thread IMO. 1. Agree that the hierarchy has gotten much taller now. and we need to immediately pull back more code into hudi-client-common. IMO what we lack is some kind of

Re: [DISCUSS] Rethink the abstraction of current client

2021-02-02 Thread vino yang
Hi, > I think the proposed interfaces indeed look more intuitive and could simplify the code structures. My concern is mostly around the ROI of such refactoring work. Probably I lack some direct involvement in the flink client work but it looks like it's mainly about code restructuring and

Re: [DISCUSS] Rethink the abstraction of current client

2021-01-19 Thread vino yang
>> For the Spark client, it is true because no matter Spark or Spark streaming engine, they write as batches, but things are different for pure streaming engines like Flink, Flink writes per-record, it does not accumulate buffers. Yes, what I mean about the "batch" is not about the behavior or

Re: [DISCUSS] Rethink the abstraction of current client

2021-01-19 Thread Danny Chan
> It contains three components: - Two objects: a table, a batch of records; For the Spark client, it is true because no matter Spark or Spark streaming engine, they write as batches, but things are different for pure streaming engines like Flink, Flink writes per-record, it does not

[DISCUSS] Rethink the abstraction of current client

2021-01-19 Thread vino yang
Hi guys, *I open this thread to discuss if we can separate the attributes and behaviors of HoodieTable, and rethink the abstraction of the client.* Currently, in the hudi-client-common module, there is a HoodieTable class, which contains a set of attributes and behaviors. For different engines,