Hello Hive Development Team,

We are observing a significant performance issue with queries on a
*non-partitioned
ORC table*. Our investigation indicates that ORC predicate pushdown (SARG)
is not being applied at the storage layer, forcing full data scans instead
of efficient, filtered reads.

>From the TezChild logs, we can see that Hive correctly identifies the
pushdown predicate. However, it then explicitly instructs the ORC reader to
ignore it for filtering by setting the allowSARGToFilter option to false.

```
2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.OrcInputFormat|: ORC
pushdown predicate: (and leaf-(BETWEEN inv_quantity_on_hand 100 500) (not
leaf-(IS_NULL inv_item_sk)) (not leaf-(IS_NULL inv_date_sk)))
2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.ReaderImpl|: Reading ORC
rows from hdfs://.../inventory/000000_0 with {..., sarg: (and leaf-(BETWEEN
inv_quantity_on_hand 100 500) ...), ..., allowSARGToFilter: false, ...}
```

However, we have confirmed that when we run *the exact same query on the
same data in our Hive 2.3.2 environment, predicate pushdown works correctly*,
and the data is filtered at the ORC reader level as expected.

Our hypothesis is that this difference is due to changes in the ORC
integration. We suspect that the ORC version used in Hive 2.3.2 (likely ORC
1.3.3) did not have the allowSARGToFilter parameter and would always apply
a filter if a sarg was present. The introduction of this flag in newer
versions seems to have inadvertently caused this performance regression in
our use case.

Given this, we strongly believe that there should be a way for users to
control this behavior. *We propose that Hive should provide a configuration
(e.g., a session variable or a table property) to explicitly
set allowSARGToFilter to true*. This would restore the efficient behavior
of older versions and provide a crucial performance tuning capability.

What are your thoughts on this? Is our analysis correct, and would you be
open to considering such a feature?
For context, here is our environment information:
*Hive Version: 4.0.1*
*Execution Engine: 0.10.4*
*Query : tpcds scale 300 query82*

Thank you for your time and any guidance you can offer.

Best regards,

seoyeon.

Reply via email to