Re: Getting the KafkaStream ID

Jun Rao Wed, 11 Jun 2014 07:09:51 -0700

The jmx should be of the form of clientId*-ConsumerLag under kafka.server.
Pausing the iteration will indirectly pause the underlying fetcher.


Thanks,

Jun


On Wed, Jun 11, 2014 at 3:09 AM, Bogdan Dimitriu (bdimitri) <
bdimi...@cisco.com> wrote:

> Which JMX MBeans are you referring to, Jun? I couldn’t find anything that
> gives me the same information as the ConsumerOffsetChecker tool.
> In any case, my main problem is that I don’t know when I should slow down
> the iteration because I don’t know which stream the iteration is
> consuming. I have the global partition lag situation (from the
> ConsumerOffsetChecker), but I don’t know how to apply it to slow down the
> iterators because I can’t identify them.
>
> Thanks,
> Bogdan
>
> On 09/06/2014 15:28, "Jun Rao" <jun...@gmail.com> wrote:
>
> >We do have a jmx that reports the lag per partition. You could probably
> >get
> >the lag that way. Then, you just need to slow down the iteration on the
> >fast partition.
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Mon, Jun 9, 2014 at 4:07 AM, Bogdan Dimitriu (bdimitri) <
> >bdimi...@cisco.com> wrote:
> >
> >> Certainly.
> >> I know this may not sound like a great idea but I am running out of
> >> options here: I¹m basically trying to implement a consumer throttle. My
> >> application consumes from a fairly high number of partitions from a
> >>number
> >> of consumer servers. The data is put in the partitions by a producer in
> >>a
> >> round robin fashion so the number of messages each partition is given is
> >> even. The messages have a time component assigned to them.
> >> Now, for a good majority of time the consumers will be faster than the
> >> producers, so the lags I get with the ConsumerOffsetChecker are mostly 0
> >> (or 1) and this works well with the time component because once they are
> >> consumed there is a logical grouping of messages from all partitions
> >>based
> >> on the time component (coarsely).
> >> The point where I¹m starting to get into trouble is when the consumers
> >>are
> >> all stopped for a while and messages start to accumulate in the
> >>partitions
> >> without being consumed (hence the lag increases). Once I resume the
> >> consumers (with 1 thread per each partition) messages start to get
> >> consumed very fast, but because some messages take longer to process
> >>than
> >> others, over time the lags start to get very uneven between partitions
> >>and
> >> this starts to interfere with the grouping by the time component.
> >> So the only way I thought I could prevent this from happening was by
> >> throttling the ³fast² consumers, which have a smaller lag. This will
> >>only
> >> happen rarely, so I thought I could live with the approach. To do that,
> >>I
> >> obviously need to get the lag information periodically. I do that by a
> >> mechanism similar to what ConsumerOffsetChecker does. But now I need to
> >> know from which thread to call sleep() and there seems to be no decent
> >>way
> >> to find that out.
> >> Any better ideas would be highly appreciated.
> >>
> >> Thanks,
> >> Bogdan
> >>
> >> On 08/06/2014 06:25, "Jun Rao" <jun...@gmail.com> wrote:
> >>
> >> >Could you elaborate on the use case of the stream ID?
> >> >
> >> >Thanks,
> >> >
> >> >Jun
> >> >
> >> >
> >> >On Fri, Jun 6, 2014 at 2:13 AM, Bogdan Dimitriu (bdimitri) <
> >> >bdimi...@cisco.com> wrote:
> >> >
> >> >> Hello folks,
> >> >>
> >> >> I¹m using Kafka 0.8.0 with the high level consumer and I have a
> >> >>situation
> >> >> where I need to obtain the ID for each of the KafkaStreams that I
> >> >>create.
> >> >> The KafkaStream class has a method called ³clientId()² that I
> >>expected
> >> >> would give me just that, but unfortunately it returns the name of the
> >> >> consumer group.
> >> >> So to make it clear, what I want to obtain is the string that looks
> >>like
> >> >> this: myconsumergroup_myhost-1402045464004-2dc0cbf2-0.
> >> >> Is there any way I could get that value for each of the streams? I¹ve
> >> >> looked around the source code but I can¹t see any way to do this.
> >> >>
> >> >> Many thanks,
> >> >> Bogdan
> >> >>
> >> >>
> >>
> >>
>
>

Re: Getting the KafkaStream ID

Reply via email to