Paul Rogers created DRILL-5636: ---------------------------------- Summary: External sort should not copy data prior to spilling Key: DRILL-5636 URL: https://issues.apache.org/jira/browse/DRILL-5636 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.8.0 Reporter: Paul Rogers Assignee: Paul Rogers Priority: Minor
The external sort spills data to disk under memory pressure. The sort code uses a generic mechanism to do the spilling: * Use a "priority queue copier" to copy sorted records into a new batch * Spill the new batch by writing the vectors for the newly-created batch The above works fine when memory is plentiful. But, under low-memory conditions, the intermediate copy can cause OOM errors. An improved algorithm is: * Priority queue copier works vector-by-vector * Serialize each vector to disk * Release its memory * Repeat for the next vector The advantages of the above: * Less intermediate memory use * Perhaps better CPU cache performance through greater locality (all writes happen to a single vector at a time, rather than row by row) * No change in disk format or disk write performance (because data is buffered prior to write anyway.) * -- This message was sent by Atlassian JIRA (v6.4.14#64029)