Re: Running cluster of stream processing application

Sachin Mittal Thu, 08 Dec 2016 20:56:42 -0800

Hi,
I followed the document and I have few questions.
Say I have a single partition input key topic and say I run 2 streams
application from machine1 and machine2.
Both the application have same application id are have identical code.
Say topic1 has messages like
(k1, v11)
(k1, v12)
(k1, v13)
(k2, v21)
(k2, v22)
(k2, v23)
When I was running single application I was getting results like
(k1, agg(v11, v12, v13))
(k2, agg(v21, v22, v23))


Now when 2 applications are run and say messages are read in round robin
fashion.
v11 v13 v22 - machine 1
v12 v21 v23 - machine 2

The aggregation at machine 1 would be
(k1, agg(v11, v13))
(k2, agg(v22))

The aggregation at machine 2 would be
(k1, agg(v12))
(k2, agg(v21, v23))

So now where do I join the independent results of these 2 aggregation to
get the final result as expected when single instance was running.

Note my high level dsl is sometime like
srcSTopic.aggragate(...).foreach(key, aggregation) {
    //process aggragated value and push it to some external storage
}

So I want this each to be running against the final set of aggregated
value. Do I need to add another step before foreach to make sure the
different results from 2 machines are joined to get the final one as
expected. If yes what does that step 2.

Thanks
Sachin






On Fri, Dec 9, 2016 at 9:42 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hi Sachin,
>
> Some quick answers, and a link to some documentation to read more:
>
> - If you restart the application, it will start from the point it crashed
> (possibly reprocessing a small window of records).
>
> - You can run more than one instance of the application.  They'll
> coordinate by virtue of being part of a Kafka consumer group; if one
> crashes, the partitions that it was reading from will be picked up by other
> instances.
>
> - When running more than one instance, the tasks will be distributed
> between the instances.
>
> Confluent's docs on the Kafka Streams architecture goes into a lot more
> detail: http://docs.confluent.io/3.0.0/streams/architecture.html
>
>
>
>
> On Thu, Dec 8, 2016 at 9:05 PM, Sachin Mittal <sjmit...@gmail.com> wrote:
>
> > Hi All,
> > We were able to run a stream processing application against a fairly
> decent
> > load of messages in production environment.
> >
> > To make the system robust say the stream processing application crashes,
> is
> > there a way to make it auto start from the point when it crashed?
> >
> > Also is there any concept like running the same application in a cluster,
> > where one fails, other takes over, until we bring back up the failed node
> > of streams application.
> >
> > If yes, is there any guidelines or some knowledge base we can look at to
> > understand how this would work.
> >
> > Is there way like in spark, where the driver program distributes the
> tasks
> > across various nodes in a cluster, is there something similar in kafka
> > streaming too.
> >
> > Thanks
> > Sachin
> >
>

Re: Running cluster of stream processing application

Reply via email to