Xinyu Zeng created ORC-1232:
-------------------------------

             Summary: Disable metrics collector by default
                 Key: ORC-1232
                 URL: https://issues.apache.org/jira/browse/ORC-1232
             Project: ORC
          Issue Type: Improvement
            Reporter: Xinyu Zeng


ORC-961 introduced a metrics collector for the reader. However, it may affect 
the performance of reading ORC files. It may be helpful to disable it as 
default.

 

Reproducable experiment result:

Alibaba Cloud 
[ecs.s6-c1m4.xlarge|https://help.aliyun.com/document_detail/25378.html#s6], 
running Ubuntu 20.04, ESSD PL1 40GB

The original file is 4.1GB csv file with generated string with some degree of 
repetiveness (the value of one column follows a zipfian distribution). The ORC 
file with dictionary encoding and no block compression is 319MB.

 

Time of running orc-scan with metrics enabled: 7.5s

Time of running orc-scan with metrics disabled: 1.5s

The action of disable is implemented by adding 

readerOpts.setReaderMetrics(nullptr);

after 
https://github.com/apache/orc/blob/02e48107b36b8ed868797dadcd7355a632519d48/tools/src/FileScan.cc#L26



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to