Hi, - yes - it's great that you wrote it yourself - it means you have more
control. I have the feeling that the most efficient point to discard as
much data as possible - or even modify your subscription protocol to - your
spark input source - not even receive the other 50 seconds of data is the
most efficient point. After you deliver data to DStream - you might filter
them as much as you want - but you will still be subject to garbage
collection and/or potential shuffles/and HDD checkpoints.

On Thu, Aug 6, 2015 at 1:31 AM, Heath Guo <heath...@fb.com> wrote:

> Hi Dimitris,
>
> Thanks for your reply. Just wondering – are you asking about my streaming
> input source? I implemented a custom receiver and have been using that.
> Thanks.
>
> From: Dimitris Kouzis - Loukas <look...@gmail.com>
> Date: Wednesday, August 5, 2015 at 5:27 PM
> To: Heath Guo <heath...@fb.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: Pause Spark Streaming reading or sampling streaming data
>
> What driver do you use? Sounds like something you should do before the
> driver...
>
> On Thu, Aug 6, 2015 at 12:50 AM, Heath Guo <heath...@fb.com> wrote:
>
>> Hi, I have a question about sampling Spark Streaming data, or getting
>> part of the data. For every minute, I only want the data read in during the
>> first 10 seconds, and discard all data in the next 50 seconds. Is there any
>> way to pause reading and discard data in that period? I'm doing this to
>> sample from a stream of huge amount of data, which saves processing time in
>> the real-time program. Thanks!
>>
>
>

Reply via email to