Chen Luo created ASTERIXDB-2784:
-----------------------------------

             Summary: Join memory requirement for large objects
                 Key: ASTERIXDB-2784
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: COMP - Compiler, RT - Runtime
            Reporter: Chen Luo


Currently the compiler assumes the minimum number of join frames is 5 [1]. 
However, this does not guarantee a join will always succeed in case of large 
objects. The actual join memory requirement is actually MAX(5, #partitions * 
#large object size). The reason is that in the spill policy [2], we only spill 
a partition if it hasn't been spilled before. As a result, when we are writing 
to an empty partition, it is possible that each of other partitions has one 
large object (which could be larger than the frame size) but no partition can 
be spilled. Thus, the join memory requirement becomes #partitions * #large 
object size in this case.

[1] 
[https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).]

[2] 
https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to