Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
ize of the >> cluster, parallelization, etc. The key things is that you must ensure >> sufficient parallelization at every stage - receiving, shuffles >> (updateStateByKey included), and output. >> >> Some more discussion in my talk - https://www.youtube.com/watch?

Re: Sessionization using updateStateByKey

2015-07-15 Thread Cody Koeninger
independent of the what >> is the source that is being use - receiver based or direct Kafka. The >> absolutely performance obvious depends on a LOT of variables, size of the >> cluster, parallelization, etc. The key things is that you must ensure >> sufficient parallelizati

Re: Sessionization using updateStateByKey

2015-07-15 Thread Sean McNamara
t; parallelization at every stage - receiving, shuffles (updateStateByKey > included), and output. > > Some more discussion in my talk - https://www.youtube.com/watch?v=d5UJonrruHk > > > > On Tue, Jul 14, 2015 at 4:13 PM, swetha > mailto:swethakasire...@gmail.com>> wrote:

Re: Sessionization using updateStateByKey

2015-07-15 Thread Silvio Fiorito
Subject: Re: Sessionization using updateStateByKey An in-memory hash key data structure of some kind so that you're close to linear on the number of items in a batch, not the number of outstanding keys. That's more complex, because you have to deal with expiration for keys that never get hi

Re: Sessionization using updateStateByKey

2015-07-15 Thread Cody Koeninger
ng, shuffles > (updateStateByKey included), and output. > > > > Some more discussion in my talk - > https://www.youtube.com/watch?v=d5UJonrruHk > > > > > > > > On Tue, Jul 14, 2015 at 4:13 PM, swetha > wrote: > > > > Hi, > > > > I ha

Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
ficient > parallelization at every stage - receiving, shuffles (updateStateByKey > included), and output. > > Some more discussion in my talk - https://www.youtube.com/watch?v=d5UJonrruHk > > > > On Tue, Jul 14, 2015 at 4:13 PM, swetha wrote: > > Hi, > &g

Re: Sessionization using updateStateByKey

2015-07-15 Thread Cody Koeninger
:13 PM, swetha wrote: > >> >> Hi, >> >> I have a question regarding sessionization using updateStateByKey. If near >> real time state needs to be maintained in a Streaming application, what >> happens when the number of RDDs to maintain the state becomes very large

Re: Sessionization using updateStateByKey

2015-07-14 Thread Tathagata Das
gt; Hi, > > I have a question regarding sessionization using updateStateByKey. If near > real time state needs to be maintained in a Streaming application, what > happens when the number of RDDs to maintain the state becomes very large? > Does it automatically get saved to HDFS and

Sessionization using updateStateByKey

2015-07-14 Thread swetha
Hi, I have a question regarding sessionization using updateStateByKey. If near real time state needs to be maintained in a Streaming application, what happens when the number of RDDs to maintain the state becomes very large? Does it automatically get saved to HDFS and reload when needed or do I