Hi,
I have a quick question. Did you try setting orc.sarg.to.filter to true in
hive-site.xml?

--- Sungwoo

On Wed, Aug 27, 2025 at 3:02 PM 서연 <[email protected]> wrote:

> Hello Hive Development Team,
>
> We are observing a significant performance issue with queries on a 
> *non-partitioned
> ORC table*. Our investigation indicates that ORC predicate pushdown
> (SARG) is not being applied at the storage layer, forcing full data scans
> instead of efficient, filtered reads.
>
> From the TezChild logs, we can see that Hive correctly identifies the
> pushdown predicate. However, it then explicitly instructs the ORC reader to
> ignore it for filtering by setting the allowSARGToFilter option to false.
>
> ```
> 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.OrcInputFormat|: ORC
> pushdown predicate: (and leaf-(BETWEEN inv_quantity_on_hand 100 500) (not
> leaf-(IS_NULL inv_item_sk)) (not leaf-(IS_NULL inv_date_sk)))
> 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.ReaderImpl|: Reading ORC
> rows from hdfs://.../inventory/000000_0 with {..., sarg: (and leaf-(BETWEEN
> inv_quantity_on_hand 100 500) ...), ..., allowSARGToFilter: false, ...}
> ```
>
> However, we have confirmed that when we run *the exact same query on the
> same data in our Hive 2.3.2 environment, predicate pushdown works correctly*,
> and the data is filtered at the ORC reader level as expected.
>
> Our hypothesis is that this difference is due to changes in the ORC
> integration. We suspect that the ORC version used in Hive 2.3.2 (likely ORC
> 1.3.3) did not have the allowSARGToFilter parameter and would always
> apply a filter if a sarg was present. The introduction of this flag in
> newer versions seems to have inadvertently caused this performance
> regression in our use case.
>
> Given this, we strongly believe that there should be a way for users to
> control this behavior. *We propose that Hive should provide a
> configuration (e.g., a session variable or a table property) to explicitly
> set allowSARGToFilter to true*. This would restore the efficient behavior
> of older versions and provide a crucial performance tuning capability.
>
> What are your thoughts on this? Is our analysis correct, and would you be
> open to considering such a feature?
> For context, here is our environment information:
> *Hive Version: 4.0.1*
> *Execution Engine: 0.10.4*
> *Query : tpcds scale 300 query82*
>
> Thank you for your time and any guidance you can offer.
>
> Best regards,
>
> seoyeon.
>

Reply via email to