Re: Spark Streaming data checkpoint performance

2015-11-07 Thread trung kien
it takes about 800ms to finish, but every >>>>> 10 seconds when data checkpoint is running >>>>> It took me 5 seconds to finish the same size micro-batch, why it's >>>>> too high? what's kind of job in checkpoint? >>>>> why it's keep incre

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Aniket Bhatnagar
;s keep increasing? >>>> >>>> 2/ When I changes the data checkpoint interval like using: >>>> stats.checkpoint(Durations.seconds(100)); //change to 100, >>>> defaults is 10 >>>> >>>> The checkpoint is keep increasing significan

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Thúy Hằng Lê
, >>> defaults is 10 >>> >>> The checkpoint is keep increasing significantly first checkpoint is >>> 10s, second is 30s, third is 70s ... and keep increasing :) >>> Why it's too high when increasing checkpoint interval? >>> >>>

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Aniket Bhatnagar
ks more stable. >> >> On Nov 4, 2015 9:08 PM, "Adrian Tanase" wrote: >> >>> Nice! Thanks for sharing, I wasn’t aware of the new API. >>> >>> Left some comments on the JIRA and design doc. >>> >>> -adrian >>> >

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Thúy Hằng Lê
gt; On Nov 4, 2015 9:08 PM, "Adrian Tanase" wrote: > >> Nice! Thanks for sharing, I wasn’t aware of the new API. >> >> Left some comments on the JIRA and design doc. >> >> -adrian >> >> From: Shixiong Zhu >> Date: Tuesday, November 3, 2015 a

Re: Spark Streaming data checkpoint performance

2015-11-05 Thread Thúy Hằng Lê
ase" wrote: > Nice! Thanks for sharing, I wasn’t aware of the new API. > > Left some comments on the JIRA and design doc. > > -adrian > > From: Shixiong Zhu > Date: Tuesday, November 3, 2015 at 3:32 AM > To: Thúy Hằng Lê > Cc: Adrian Tanase, &quo

Re: Spark Streaming data checkpoint performance

2015-11-04 Thread Adrian Tanase
Spark Streaming data checkpoint performance "trackStateByKey" is about to be added in 1.6 to resolve the performance issue of "updateStateByKey". You can take a look at https://issues.apache.org/jira/browse/SPARK-2629 and https://github.com/apache/spark/pull/9256

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Shixiong Zhu
"trackStateByKey" is about to be added in 1.6 to resolve the performance issue of "updateStateByKey". You can take a look at https://issues.apache.org/jira/browse/SPARK-2629 and https://github.com/apache/spark/pull/9256

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Thúy Hằng Lê
complicated and try to prune out words >with very few occurrences or that haven’t been updated for a long time > - You can do this by emitting None from updateStateByKey > > Hope this helps, > -adrian > > From: Thúy Hằng Lê > Date: Monday, November 2, 2015 at 7:2

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Adrian Tanase
From: Thúy Hằng Lê Date: Monday, November 2, 2015 at 7:20 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Spark Streaming data checkpoint performance JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

Spark Streaming data checkpoint performance

2015-11-01 Thread Thúy Hằng Lê
Hi Spark guru I am evaluating Spark Streaming, In my application I need to maintain cumulative statistics (e.g the total running word count), so I need to call the updateStateByKey function on very micro-batch. After setting those things, I got following behaviors: * The Processing Time