Chen Luo created ASTERIXDB-2784: ----------------------------------- Summary: Join memory requirement for large objects Key: ASTERIXDB-2784 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784 Project: Apache AsterixDB Issue Type: Improvement Components: COMP - Compiler, RT - Runtime Reporter: Chen Luo
Currently the compiler assumes the minimum number of join frames is 5 [1]. However, this does not guarantee a join will always succeed in case of large objects. The actual join memory requirement is actually MAX(5, #partitions * #large object size). The reason is that in the spill policy [2], we only spill a partition if it hasn't been spilled before. As a result, when we are writing to an empty partition, it is possible that each of other partitions has one large object (which could be larger than the frame size) but no partition can be spilled. Thus, the join memory requirement becomes #partitions * #large object size in this case. [1] [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).] [2] https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55 -- This message was sent by Atlassian Jira (v8.3.4#803005)