Chen Luo created ASTERIXDB-2541: ----------------------------------- Summary: Introduce GreedyScheduler Key: ASTERIXDB-2541 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo
Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance. Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic. In order for GreedyScheduler to work, we need the following two changes: * Keep track of the number of scanned pages of index cursors so that we will know how many pages left; * Introduce a mechanism to activate/deactivate merge operations -- This message was sent by Atlassian JIRA (v7.6.3#76005)