r it?
> >>
> >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Do we have plan to integrate data TTL into HUDI, so we don't have to
> >>> schedule a offline spark job to delete outdated data, just set a TTL
> >>> config, then writer or some offline service will delete old data as
> >>> expected.
> >>>
> >>
>
>
--
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure
o delta logs. This is a
> >lock-free process if we can make sure they don’t write data to the
> same
> > log
> >file (plan to create multiple marker files to achieve this). And with
> > log
> >merge API(preCombine logic in Payload class), data in log files can be
> > read
> >properly
> >-
> >
> >Since hudi already has an index type like Bucket index which can map
> >key-bucket in a consistent way. Data duplicates can be eliminated
> >
> >
> > Thanks,
> > Jian Feng
> >
>
--
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure
odb.io/blog/2020/08/04/prestodb-and-hudi
> > [3] https://github.com/trinodb/trino/pull/9641
> > [4]
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> > [5]
> >
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> > [6] https://github.com/codope/trino/tree/hudi-plugin
> > [7] https://trino.io/docs/current/develop/connectors.html
> >
>
--
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure
at 12:50 AM Vinoth Chandar wrote:
> Yeah all the rate limiting code in HBaseIndex is working around for these
> large bulk writes.
>
> On Tue, Oct 5, 2021 at 11:16 AM Jian Feng wrote:
>
> > actually I met this problem when bootstrap a huge table,after changed
> >
cle may
> provide more information. Great thanks to the author.
>
>
> https://mp.weixin.qq.com/s?__biz=MzIyMzQ0NjA0MQ==&mid=2247484306&idx=1&sn=1d853469159a600d82050c17e6a2a075&chksm=e81f56e4df68dff2da417109c4a971aef54f056bc0519558c58e23fe60b90dc6e4f8d7e92774&token=168846
r?
I tried recreate table , it happens again
--
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure
tion of
> the
> > whole hudi project.
> >
> > On Mon, Oct 4, 2021, 11:29 PM wrote:
> > when I bootstrape a huge hbase index table, I found all keys have a
> prefix
> > 'itemid:', then it caused data skew, there are 100 region servers in
> hbase
> > but only one was handle datas Is there any way to avoid this issue on the
> > Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure
> >
>
--
Full jian
|
Mobile
Address
when I bootstrape a huge hbase index table, I found all keys have a prefix
'itemid:', then it caused data skew, there are 100 region servers in hbase
but only one was handle datas
Is there any way to avoid this issue on the Hudi side ?
--
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure
Hi all, anyone can give me a sample?
--
FengJian
Data Infrastructure Team
Mobile +65 90388153
Address 5 Science Park Drive, Shopee Building, Singapore 118265
anyone can help to see this issue ?
https://github.com/apache/hudi/issues/3327
when ingest data into hudi table ,I found if the avro schema has a array
field with nested array field and has no other fields, error happens
but if I add a dummy field , ingestion works fine
--
FengJian
Data Infrast
I saw a pr here https://github.com/apache/hudi/pull/3252
--
FengJian
Data Infrastructure Team
Mobile +65 90388153
Address 5 Science Park Drive, Shopee Building, Singapore 118265
11 matches
Mail list logo