When spilling to disk is enabled, an upstream operator will be blocked
from emitting more tuples to a corresponding output port when the size
of a buffer (in bytes) exceeds a limit (see documentation on how to
configure the limit). This is a back pressure mechanism that Pramod
refers to. There are two ways how data/tuples may be removed from the
buffer to make more space on the buffer available and enable back the
upstream operator. Tuples can be either spooled to a local disk or
completely purged from the buffer. The purge happens only after window
(actually the earliest checkpoint window after the window that the tuple
belongs to) is completely processed by an application/dag. If there is
not enough disk space for spooling, buffer server would fail the
container that it belongs to. There are few JIRAs filed to improve the
current behavior (for example limit amount of disk space that the buffer
server can use for spilling).
Thank you,
Vlad
On 6/20/18 17:24, Pramod Immaneni wrote:
When back pressure is enabled (default) the upstream operators are
blocked till space is freed up by downstream operators consuming data.
Since bufferserver also provides fault recovery functionality it
cannot immediately clear out the data when it is consumed by the
downstream operators and needs to keep it around till next checkpoints
thoughout the dag and the spillover to disk can come into play if the
amount of data between checkpoints is greater than the in memory
buffer capacity.
Thanks
On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny
<m.zakarcze...@gmail.com <mailto:m.zakarcze...@gmail.com>> wrote:
HI,
I'm reading Apex documentation regarding buffer servers. I'm
wondering what will happen if buffers between operators became
overflowed (lets assume non partitioned operator)?
I read somewhere that data is split to disk. But what's next? What
if disk space will be exhausted?
Regards,
Mateusz Zakarczemny