No, it's not like a given KafkaRDD object contains an array of messages
that gets serialized with the object. Its compute method generates an
iterator of messages as needed, by connecting to kafka.
I don't know what was so hefty in your checkpoint directory, because you
deleted it. My checkpoint
Well, RDD"s also contain data, don't they?
The question is, what can be so hefty in the checkpointing directory to
cause Spark driver to run out of memory? It seems that it makes
checkpointing expensive, in terms of I/O and memory consumption. Two
network hops -- to driver, then to workers. Hef
The rdd is indeed defined by mostly just the offsets / topic partitions.
On Mon, Aug 10, 2015 at 3:24 PM, Dmitry Goldenberg wrote:
> "You need to keep a certain number of rdds around for checkpointing" --
> that seems like a hefty expense to pay in order to achieve fault
> tolerance. Why does S
"You need to keep a certain number of rdds around for checkpointing" --
that seems like a hefty expense to pay in order to achieve fault
tolerance. Why does Spark persist whole RDD's of data? Shouldn't it be
sufficient to just persist the offsets, to know where to resume from?
Thanks.
On Mon, A
Looks like workaround is to reduce *window length.*
*Cheers*
On Mon, Aug 10, 2015 at 10:07 AM, Cody Koeninger wrote:
> You need to keep a certain number of rdds around for checkpointing, based
> on e.g. the window size. Those would all need to be loaded at once.
>
> On Mon, Aug 10, 2015 at 11:
You need to keep a certain number of rdds around for checkpointing, based
on e.g. the window size. Those would all need to be loaded at once.
On Mon, Aug 10, 2015 at 11:49 AM, Dmitry Goldenberg <
dgoldenberg...@gmail.com> wrote:
> Would there be a way to chunk up/batch up the contents of the
> c
Would there be a way to chunk up/batch up the contents of the checkpointing
directories as they're being processed by Spark Streaming? Is it mandatory
to load the whole thing in one go?
On Mon, Aug 10, 2015 at 12:42 PM, Ted Yu wrote:
> I wonder during recovery from a checkpoint whether we can e
I wonder during recovery from a checkpoint whether we can estimate the size
of the checkpoint and compare with Runtime.getRuntime().freeMemory().
If the size of checkpoint is much bigger than free memory, log warning, etc
Cheers
On Mon, Aug 10, 2015 at 9:34 AM, Dmitry Goldenberg wrote:
> Thank
Thanks, Cody, will try that. Unfortunately due to a reinstall I don't have
the original checkpointing directory :( Thanks for the clarification on
spark.driver.memory, I'll keep testing (at 2g things seem OK for now).
On Mon, Aug 10, 2015 at 12:10 PM, Cody Koeninger wrote:
> That looks like it'
That looks like it's during recovery from a checkpoint, so it'd be driver
memory not executor memory.
How big is the checkpoint directory that you're trying to restore from?
On Mon, Aug 10, 2015 at 10:57 AM, Dmitry Goldenberg <
dgoldenberg...@gmail.com> wrote:
> We're getting the below error. T
10 matches
Mail list logo