Hi Anshuman,

Not sure about the presto internals needed to accomplish your goal. But I
can talk about the annotation I added back then.
As long as your CustomInputFormat adds it, Presto will fallback to
obtaining splits using that, instead of listing filesystem on its own.
Hope that helps

Thanks
Vinoth

On Fri, May 3, 2019 at 11:31 PM anshuman.w...@gmail.com <
anshuman.w...@gmail.com> wrote:

> Hi
>
> Recently attempted to use prestodb and liked it and came across the hudi
> project.
> After going through a few pages and the source code it seems that hudi has
> its own inputFormat.
>
> I am attempting to do an early predicate pushdown on my own side and the
> question is probably not related to hudi .. but just wanted to get an idea
> if someone with experience using both prestodb and hive could enlighten me
> on.
>
> Imagine I create a hive table using my own customInputFormat. I see that
> Hudi has contributed an annotation which allows prestodb to invoke the
> splits from the customInputFormat.
>
> for simplicity the hive table consists of two columns someid, anotherid
>
> Imagine files in hdfs are laid out as
> /some/folder/someid.anotherid.someformat
>
> and a query such as select * from hive_table where anotherid = abc.
>
> what i want to attempt to do is to capture the above query so that when
> the prestodb queries hivemetadata for the table and returns my
> customInputFormat then i could potentially in the getSplit method use a
> glob expression to filter out and grab only those files which satisfy the
> condition anotherid=abc before the handoff to the query execution in presto.
>
> any pointers would be useful.
>
> Thanks,
>

Reply via email to