ize of the
>> cluster, parallelization, etc. The key things is that you must ensure
>> sufficient parallelization at every stage - receiving, shuffles
>> (updateStateByKey included), and output.
>>
>> Some more discussion in my talk - https://www.youtube.com/watch?
independent of the what
>> is the source that is being use - receiver based or direct Kafka. The
>> absolutely performance obvious depends on a LOT of variables, size of the
>> cluster, parallelization, etc. The key things is that you must ensure
>> sufficient parallelizati
t; parallelization at every stage - receiving, shuffles (updateStateByKey
> included), and output.
>
> Some more discussion in my talk - https://www.youtube.com/watch?v=d5UJonrruHk
>
>
>
> On Tue, Jul 14, 2015 at 4:13 PM, swetha
> mailto:swethakasire...@gmail.com>> wrote:
Subject: Re: Sessionization using updateStateByKey
An in-memory hash key data structure of some kind so that you're close to
linear on the number of items in a batch, not the number of outstanding keys.
That's more complex, because you have to deal with expiration for keys that
never get hi
ng, shuffles
> (updateStateByKey included), and output.
> >
> > Some more discussion in my talk -
> https://www.youtube.com/watch?v=d5UJonrruHk
> >
> >
> >
> > On Tue, Jul 14, 2015 at 4:13 PM, swetha
> wrote:
> >
> > Hi,
> >
> > I ha
ficient
> parallelization at every stage - receiving, shuffles (updateStateByKey
> included), and output.
>
> Some more discussion in my talk - https://www.youtube.com/watch?v=d5UJonrruHk
>
>
>
> On Tue, Jul 14, 2015 at 4:13 PM, swetha wrote:
>
> Hi,
>
&g
:13 PM, swetha wrote:
>
>>
>> Hi,
>>
>> I have a question regarding sessionization using updateStateByKey. If near
>> real time state needs to be maintained in a Streaming application, what
>> happens when the number of RDDs to maintain the state becomes very large
gt; Hi,
>
> I have a question regarding sessionization using updateStateByKey. If near
> real time state needs to be maintained in a Streaming application, what
> happens when the number of RDDs to maintain the state becomes very large?
> Does it automatically get saved to HDFS and
Hi,
I have a question regarding sessionization using updateStateByKey. If near
real time state needs to be maintained in a Streaming application, what
happens when the number of RDDs to maintain the state becomes very large?
Does it automatically get saved to HDFS and reload when needed or do I