Thanks Gwen. This really helped.

Yes, Kafka is the best thing ever :)

Now how would this be done with the Simple consumer? I'm guessing I'll have
to maintain my own state in Zookeeper or something of that sort?


On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira <gshap...@cloudera.com> wrote:

> Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
> and 1 consumer group (flume), each of the 3 topic partitions is being
> read by a different machine running the flume consumer:
> Group           Topic                          Pid Offset
> logSize         Lag             Owner
> flume           t1                             0   50172068
> 100210042       50037974
> flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0
> flume           t1                             1   49914701
> 49914701        0
> flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0
> flume           t1                             2   54218841
> 82733380        28514539
> flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0
>
> If flume_kafkacdh-1 crashed, another broker will pick up the partition:
> Group           Topic                          Pid Offset
> logSize         Lag             Owner
> flume           t1                             0   59669715
> 100210042       40540327
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume           t1                             1   49914701
> 49914701        0
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume           t1                             2   65796205
> 82733380        16937175
> flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
>
> Then I can start flume_kafkacdh-4 and see things rebalance again:
> flume           t1                             0   60669715
> 100210042       39540327
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume           t1                             1   49914701
> 49914701        0
> flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
> flume           t1                             2   66829740
> 82733380        15903640
> flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0
>
> Isn't Kafka the best thing ever? :)
>
> Gwen
>
> On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> > yep. exactly.
> >
> > On Wed, Oct 8, 2014 at 11:07 AM, Sharninder <sharnin...@gmail.com>
> wrote:
> >> Thanks Gwen.
> >>
> >> When you're saying that I can add consumers to the same group, does that
> >> also hold true if those consumers are running on different machines? Or
> in
> >> different JVMs?
> >>
> >> --
> >> Sharninder
> >>
> >>
> >> On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >>
> >>> If you use the high level consumer implementation, and register all
> >>> consumers as part of the same group - they will load-balance
> >>> automatically.
> >>>
> >>> When you add a consumer to the group, if there are enough partitions
> >>> in the topic, some of the partitions will be assigned to the new
> >>> consumer.
> >>> When a consumer crashes, once its node in ZK times out, other
> >>> consumers will get its partitions.
> >>>
> >>> Gwen
> >>>
> >>> On Wed, Oct 8, 2014 at 10:39 AM, Sharninder <sharnin...@gmail.com>
> wrote:
> >>> > Hi,
> >>> >
> >>> > I'm not even sure if this is a valid use-case, but I really wanted
> to run
> >>> > it by you guys. How do I load balance my consumers? For example, if
> my
> >>> > consumer machine is under load, I'd like to spin up another VM with
> >>> another
> >>> > consumer process to keep reading messages off any topic. On similar
> >>> lines,
> >>> > how do you guys handle consumer failures? Suppose one consumer
> process
> >>> gets
> >>> > an exception and crashes, is it possible for me to somehow make sure
> that
> >>> > there is another process that is still reading the queue for me?
> >>> >
> >>> > --
> >>> > Sharninder
> >>>
>

Reply via email to