Re: How big the spark stream window could be ?

Saisai Shao Mon, 09 May 2016 00:32:32 -0700

What do you mean of swap space, the system swap space or Spark's block
manager disk space?


If you're referring to swap space, I think you should first think about JVM
heap size and yarn container size before running out of system memory.

If you're referring to block manager disk space, the StorageLevel of
WindowedDStream is MEMORY_ONLY_SER, so it will not put into disk when
executor memory is not enough.


On Mon, May 9, 2016 at 3:26 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> That is a valid point Shao. However, it will start using disk space as
> memory storage akin to swap space. It will not crash I believe it will just
> be slow and this assumes that you do not run out of disk space.
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 May 2016 at 08:14, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
>> For window related operators, Spark Streaming will cache the data into
>> memory within this window, in your case your window size is up to 24 hours,
>> which means data has to be in Executor's memory for more than 1 day, this
>> may introduce several problems when memory is not enough.
>>
>> On Mon, May 9, 2016 at 3:01 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> ok terms for Spark Streaming
>>>
>>> "Batch interval" is the basic interval at which the system with receive
>>> the data in batches.
>>> This is the interval set when creating a StreamingContext. For example,
>>> if you set the batch interval as 300 seconds, then any input DStream will
>>> generate RDDs of received data at 300 seconds intervals.
>>> A window operator is defined by two parameters -
>>> - WindowDuration / WindowsLength - the length of the window
>>> - SlideDuration / SlidingInterval - the interval at which the window
>>> will slide or move forward
>>>
>>>
>>> Ok so your batch interval is 5 minutes. That is the rate messages are
>>> coming in from the source.
>>>
>>> Then you have these two params
>>>
>>> // window length - The duration of the window below that must be
>>> multiple of batch interval n in = > StreamingContext(sparkConf, Seconds(n))
>>> val windowLength = x =  m * n
>>> // sliding interval - The interval at which the window operation is
>>> performed in other words data is collected within this "previous interval'
>>> val slidingInterval =  y l x/y = even number
>>>
>>> Both the window length and the slidingInterval duration must be
>>> multiples of the batch interval, as received data is divided into batches
>>> of duration "batch interval".
>>>
>>> If you want to collect 1 hour data then windowLength =  12 * 5 * 60
>>> seconds
>>> If you want to collect 24 hour data then windowLength =  24 * 12 * 5 *
>>> 60
>>>
>>> You sliding window should be set to batch interval = 5 * 60 seconds. In
>>> other words that where the aggregates and summaries come for your report.
>>>
>>> What is your data source here?
>>>
>>> HTH
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 9 May 2016 at 04:19, kramer2...@126.com <kramer2...@126.com> wrote:
>>>
>>>> We have some stream data need to be calculated and considering use spark
>>>> stream to do it.
>>>>
>>>> We need to generate three kinds of reports. The reports are based on
>>>>
>>>> 1. The last 5 minutes data
>>>> 2. The last 1 hour data
>>>> 3. The last 24 hour data
>>>>
>>>> The frequency of reports is 5 minutes.
>>>>
>>>> After reading the docs, the most obvious way to solve this seems to set
>>>> up a
>>>> spark stream with 5 minutes interval and two window which are 1 hour
>>>> and 1
>>>> day.
>>>>
>>>>
>>>> But I am worrying that if the window is too big for one day and one
>>>> hour. I
>>>> do not have much experience on spark stream, so what is the window
>>>> length in
>>>> your environment?
>>>>
>>>> Any official docs talking about this?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-big-the-spark-stream-window-could-be-tp26899.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: How big the spark stream window could be ?

Reply via email to