it takes about 800ms to finish, but every
>>>>> 10 seconds when data checkpoint is running
>>>>> It took me 5 seconds to finish the same size micro-batch, why it's
>>>>> too high? what's kind of job in checkpoint?
>>>>> why it's keep incre
;s keep increasing?
>>>>
>>>> 2/ When I changes the data checkpoint interval like using:
>>>> stats.checkpoint(Durations.seconds(100)); //change to 100,
>>>> defaults is 10
>>>>
>>>> The checkpoint is keep increasing significan
,
>>> defaults is 10
>>>
>>> The checkpoint is keep increasing significantly first checkpoint is
>>> 10s, second is 30s, third is 70s ... and keep increasing :)
>>> Why it's too high when increasing checkpoint interval?
>>>
>>>
ks more stable.
>>
>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" wrote:
>>
>>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>>
>>> Left some comments on the JIRA and design doc.
>>>
>>> -adrian
>>>
>
gt; On Nov 4, 2015 9:08 PM, "Adrian Tanase" wrote:
>
>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>
>> Left some comments on the JIRA and design doc.
>>
>> -adrian
>>
>> From: Shixiong Zhu
>> Date: Tuesday, November 3, 2015 a
ase" wrote:
> Nice! Thanks for sharing, I wasn’t aware of the new API.
>
> Left some comments on the JIRA and design doc.
>
> -adrian
>
> From: Shixiong Zhu
> Date: Tuesday, November 3, 2015 at 3:32 AM
> To: Thúy Hằng Lê
> Cc: Adrian Tanase, &quo
Spark Streaming data checkpoint performance
"trackStateByKey" is about to be added in 1.6 to resolve the performance issue
of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
"trackStateByKey" is about to be added in 1.6 to resolve the performance
issue of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
complicated and try to prune out words
>with very few occurrences or that haven’t been updated for a long time
> - You can do this by emitting None from updateStateByKey
>
> Hope this helps,
> -adrian
>
> From: Thúy Hằng Lê
> Date: Monday, November 2, 2015 at 7:2
From: Thúy Hằng Lê
Date: Monday, November 2, 2015 at 7:20 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Spark Streaming data checkpoint performance
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
Durations.seconds(2));
Hi Spark guru
I am evaluating Spark Streaming,
In my application I need to maintain cumulative statistics (e.g the total
running word count), so I need to call the updateStateByKey function on
very micro-batch.
After setting those things, I got following behaviors:
* The Processing Time
11 matches
Mail list logo