We encountered some special OOM cases of "cogroup" when the data in one
partition is not balanced.

1. The estimated size of used memory is inaccurate. For example, there are
too many values for some special keys. Because SizeEstimator.visitArray
only samples at most 100 cells for an array, it may miss most of these
special keys and get a wrong estimated size of used memory. In our case, it
reports a CompactBuffer is only 27M, but actually it's more than 5G. Since
the estimated value is wrong, the CompactBuffer won't be spilled and cause
OOM.

2. There are too many values for a special key and these values cannot fit
into memory. Spilling data to disk helps nothing because cogroup needs to
read all values for a key into memory.

Any suggestion to solve these OOM cases? Thank you,.

Best Regards,
Shixiong Zhu

Reply via email to