This is really outside of the scope of Hive and would probably be better 
addressed by the Spark community, however I can say that this very much depends 
on your use case....

Take a look at this discussion if you haven't already:
https://groups.google.com/forum/embed/#!topic/spark-users/GQoxJHAAtX4

Generally speaking, the larger the batch window, the better the overall 
performance, but the streaming data output will be updated less 
frequently.....you will likely run into problems setting your batch window < 
0.5 sec, and/or when the batch window < the amount of time it takes to run the 
task....

Beyond that, the window length and sliding interval need to be multiples of the 
batch window, but will depend entirely on your reporting requirements.

it would be perfectly reasonable to have
batch window = 30 secs
window length = 1 hour
sliding interval = 5 mins

In that case, you'd be creating an output every 5 mins, aggregating data that 
you were collecting every 30 seconds over a previous 1 hour period of time...

could you set the batch window to 5 mins?  Possibly, depending on the data 
source, but perhaps you are already using that source on a more frequent basis 
elsewhere....or maybe you only have a 1 min buffer on the source data....lots 
of possibilities, which is why there is the flexibility and no hard/fast 
rule....

If you were trying to create continuously streaming output as fast as possible, 
then you would probably (almost always) be setting your sliding interval = 
batch window and then shrinking the batch window as short as possible.

More documentation here:
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html



From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Thursday, May 05, 2016 4:26 AM
To: user
Subject: Re: Spark Streaming, Batch interval, Windows length and Sliding 
Interval settings

Any ideas/experience on this?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 4 May 2016 at 21:45, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
Hi,

Just wanted opinions on this.

In Spark streaming the parameter

val ssc = new StreamingContext(sparkConf, Seconds(n))

defines the batch or sample interval for the incoming streams

In addition there is windows Length

// window length - The duration of the window below that must be multiple of 
batch interval n in = > StreamingContext(sparkConf, Seconds(n))

val windowLength = L

And fibally the sliding interval
// sliding interval - The interval at which the window operation is performed

val slidingInterval = I

OK so as given the windowLength  L = multiples of n and the slidingInterval has 
to be consistent to ensure that we can the head and tail of the window.

So as a heuristic approach for a batch interval of say 10 seconds, I put the 
windows length at 3 times  that = 30 seconds and make the slidinginterval = 
batch interval = 10.

Obviously these are subjective depending on what is being measured. However, I 
believe having slidinginterval = batch interval makes sense?

Appreciate any views on this.

Thanks,


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>




======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.

Reply via email to