Hi, I have a quick question. Did you try setting orc.sarg.to.filter to true in hive-site.xml?
--- Sungwoo On Wed, Aug 27, 2025 at 3:02 PM 서연 <[email protected]> wrote: > Hello Hive Development Team, > > We are observing a significant performance issue with queries on a > *non-partitioned > ORC table*. Our investigation indicates that ORC predicate pushdown > (SARG) is not being applied at the storage layer, forcing full data scans > instead of efficient, filtered reads. > > From the TezChild logs, we can see that Hive correctly identifies the > pushdown predicate. However, it then explicitly instructs the ORC reader to > ignore it for filtering by setting the allowSARGToFilter option to false. > > ``` > 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.OrcInputFormat|: ORC > pushdown predicate: (and leaf-(BETWEEN inv_quantity_on_hand 100 500) (not > leaf-(IS_NULL inv_item_sk)) (not leaf-(IS_NULL inv_date_sk))) > 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.ReaderImpl|: Reading ORC > rows from hdfs://.../inventory/000000_0 with {..., sarg: (and leaf-(BETWEEN > inv_quantity_on_hand 100 500) ...), ..., allowSARGToFilter: false, ...} > ``` > > However, we have confirmed that when we run *the exact same query on the > same data in our Hive 2.3.2 environment, predicate pushdown works correctly*, > and the data is filtered at the ORC reader level as expected. > > Our hypothesis is that this difference is due to changes in the ORC > integration. We suspect that the ORC version used in Hive 2.3.2 (likely ORC > 1.3.3) did not have the allowSARGToFilter parameter and would always > apply a filter if a sarg was present. The introduction of this flag in > newer versions seems to have inadvertently caused this performance > regression in our use case. > > Given this, we strongly believe that there should be a way for users to > control this behavior. *We propose that Hive should provide a > configuration (e.g., a session variable or a table property) to explicitly > set allowSARGToFilter to true*. This would restore the efficient behavior > of older versions and provide a crucial performance tuning capability. > > What are your thoughts on this? Is our analysis correct, and would you be > open to considering such a feature? > For context, here is our environment information: > *Hive Version: 4.0.1* > *Execution Engine: 0.10.4* > *Query : tpcds scale 300 query82* > > Thank you for your time and any guidance you can offer. > > Best regards, > > seoyeon. >
