Pavan Lanka created ORC-1027:
--------------------------------

             Summary: Filter processing to allow filter injections that cannot 
be represented via SArgs
                 Key: ORC-1027
                 URL: https://issues.apache.org/jira/browse/ORC-1027
             Project: ORC
          Issue Type: Improvement
          Components: Java
    Affects Versions: 1.7.0, 1.8.0
            Reporter: Pavan Lanka
            Assignee: Pavan Lanka


Currently in the ORCRecordReader the filter logic that perform LazyIO receives 
the following inputs:
 * SearchArgument as passed by the client using 
`Reader.Options.getSearchArgument`
 * Input filter as passed by the client using `Reader.Options.getFilterCallback`

The SearchArgument is particularly convenient in allowing for easy integration 
with the existing engines such as Spark without necessitating any code changes 
on the engine. However this push down is limited to what can be represented via 
SearchArguments as an example if we take any predicate that uses a function 
this cannot be pushed down.
{quote}SELECT * FROM table WHERE lower(f1) IN ... OR f2 IN ... OR f3 IN ...
{quote}
For the above query none of the filters are pushed down to ORC from the engine 
as we have no means for representing Functions and the use of OR to combine the 
multiple predicates.

An additional input mechanism is requested for supplying filters that is 
plugable without requiring a change in the clients directly. We are proposing 
the use of **ServiceLoader** to dynamically determine the desired filters for a 
given fully qualified file path.

This filter if determined is applied as an AND in conjunction with the other 
available filters. It is understood that the plugin filter cannot differentiate 
multiple aliases for the same table.

This generic capability will allow us to represent complex filters that 
currently cannot be pushed down to the storage layer from the existing engines 
allowing us to reap the benefits of LazyIO in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to