Great thoughts.. Lets chat more on the HIP.
>> I am thinking something like a min/max on the row key for each file.
There could be cases where a monotonous increasing id generation service is
used when there are new entities
BloomIndex already does this today. In addition to Bloom filters, it
Btw I am not able to comment on the jira. I will get this fixed and post
the comment on the jira as well. Cheers.
On Thu, Mar 28, 2019 at 12:49 AM Prasanna wrote:
> Hey Nishith,
>
> Glad we have a concrete proposal on this.
>
> My 0.02 thoughts on this.
>
> What we are really building is an
Hey Nishith,
Glad we have a concrete proposal on this.
My 0.02 thoughts on this.
What we are really building is an approximate indexing system which can
help us reduce the number of files to look for when a key is updated. The
problem with having something random in the key (like uuid) means
Here is the HIP :
https://docs.google.com/document/d/1RdxVqF60N9yRUH7HZ-s2Y_aYHLHb9xGrlRLK1OWtYKM/edit?usp=sharing
@Vinoth Chandar @balaji added you guys as approvers,
please take a look.
-Nishith
On Tue, Mar 26, 2019 at 9:47 PM nishith agarwal wrote:
> JIRA :
JIRA : https://issues.apache.org/jira/projects/HUDI/issues/HUDI-53
-Nishith
On Tue, Mar 26, 2019 at 9:21 PM nishith agarwal wrote:
> All,
>
> Currently, Hudi supports partitioned and non-partitioned datasets. A
> partitioned dataset is one which bucketizes groups of files (data) into
>
All,
Currently, Hudi supports partitioned and non-partitioned datasets. A
partitioned dataset is one which bucketizes groups of files (data) into
buckets called partitions. A hudi dataset may be composed of N number of
partitions with M number of files. This structure helps canonical