Re: [spark-streaming] kafka source and flow control

Tobias Pfeiffer Mon, 11 Aug 2014 17:58:30 -0700

Hi,

On Mon, Aug 11, 2014 at 9:41 PM, Gwenhael Pasquiers <
gwenhael.pasqui...@ericsson.com> wrote:
>
> We intend to apply other operations on the data later in the same spark
> context, but our first step is to archive it.
>
>
>
> Our goal is somth like this
>
> Step 1 : consume kafka
> Step 2 : archive to hdfs AND send to step 3
> Step 3 : transform data
>
> Step 4 : save transformed data to HDFS as input for M/R
>

I see. Well I think Spark Streaming may be well suited for that purpose.

> To us it looks like a great flaw if, in streaming mode, spark-streaming
> cannot slow down it’s consumption depending on the available resources.
>

On Mon, Aug 11, 2014 at 10:10 PM, Gwenhael Pasquiers <
gwenhael.pasqui...@ericsson.com> wrote:
>
> I think the kind of self-regulating system you describe would be too
> difficult to implement and probably unreliable (even more with the fact
> that we have multiple slaves).
>

Isn't "slow down its consumption depending on the available resources" a
"self-regulating system"? I don't see how you can adapt to available
resources without measuring your execution time and then change how much
you consume. Did you have any particular form of adaption in mind?

Tobias

Re: [spark-streaming] kafka source and flow control

Reply via email to