Hi,

> While using ORC file format, I would like to see in the logs that
>stripes and/or row-groups are being skipped based on my where clause.

There¹s no logging in the inner loop there.

> Is that info even outputted ? If so, what do I need to enable it ?

You can do a query run with the following to see the difference.

hive> set hive.tez.print.exec.summary=true;
hive> set hive.optimize.index.filter=false;
// run query 
hive> set hive.optimize.index.filter=true;
// run query

You¹ll get numbers which will indicate how much row-filtering is
happening, since the input records count for the vertex will track the
actual records read off ORC.

For an example of what that does, see
<http://www.slideshare.net/Hadoop_Summit/orc-2015-faster-better-smaller/21>

If you have hive-1.2.0 builds, then you can also try setting the
TBLPROPERTIES for orc.bloom.filter.columns to use the new row indexes as
well.

For Strings, that should work much better than the current min-max index.

Cheers,
Gopal


Reply via email to