Lei Rui created IOTDB-306: ----------------------------- Summary: count query is not that fast Key: IOTDB-306 URL: https://issues.apache.org/jira/browse/IOTDB-306 Project: Apache IoTDB Issue Type: Improvement Reporter: Lei Rui
According to my test, *q1: select count(s_10) from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00* ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk|| |23,998|1,367|13,591|7,592| Unit: ms *q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00* ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk|| |27,783|31.2+2,068|134+13,880|14.9+9,587| Unit: ms (The "+" is because the step happens in both `createNewData` and `convertQueryDataSetByFetchSize` phases.) As is shown, the total time cost of q1 is just a little bit smaller than q2. The costs of the three major steps - `readTsFileMetaData`, `readTsDeviceMetaData`, and `readMemChunk` - are very close. The reason for this consequence is that the query execution process of count query reads chunk data from disk into memory anyway and in the best cases utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading page data. However, the time cost of reading page data from ChunkBuffer (see `ChunkReader.nextBatch`) is not that large, as it is performed in memory. Therefore, the execution process of count query overlaps mostly with that of without count query. And probably other aggregate queries have the similar results. A direction of performance improvement is to avoid `readMemChunk` whenever the statistics in the ChunkMetaData can be utilized. -- This message was sent by Atlassian Jira (v8.3.4#803005)