Which JMX MBeans are you referring to, Jun? I couldn’t find anything that gives me the same information as the ConsumerOffsetChecker tool. In any case, my main problem is that I don’t know when I should slow down the iteration because I don’t know which stream the iteration is consuming. I have the global partition lag situation (from the ConsumerOffsetChecker), but I don’t know how to apply it to slow down the iterators because I can’t identify them.
Thanks, Bogdan On 09/06/2014 15:28, "Jun Rao" <jun...@gmail.com> wrote: >We do have a jmx that reports the lag per partition. You could probably >get >the lag that way. Then, you just need to slow down the iteration on the >fast partition. > >Thanks, > >Jun > > >On Mon, Jun 9, 2014 at 4:07 AM, Bogdan Dimitriu (bdimitri) < >bdimi...@cisco.com> wrote: > >> Certainly. >> I know this may not sound like a great idea but I am running out of >> options here: I¹m basically trying to implement a consumer throttle. My >> application consumes from a fairly high number of partitions from a >>number >> of consumer servers. The data is put in the partitions by a producer in >>a >> round robin fashion so the number of messages each partition is given is >> even. The messages have a time component assigned to them. >> Now, for a good majority of time the consumers will be faster than the >> producers, so the lags I get with the ConsumerOffsetChecker are mostly 0 >> (or 1) and this works well with the time component because once they are >> consumed there is a logical grouping of messages from all partitions >>based >> on the time component (coarsely). >> The point where I¹m starting to get into trouble is when the consumers >>are >> all stopped for a while and messages start to accumulate in the >>partitions >> without being consumed (hence the lag increases). Once I resume the >> consumers (with 1 thread per each partition) messages start to get >> consumed very fast, but because some messages take longer to process >>than >> others, over time the lags start to get very uneven between partitions >>and >> this starts to interfere with the grouping by the time component. >> So the only way I thought I could prevent this from happening was by >> throttling the ³fast² consumers, which have a smaller lag. This will >>only >> happen rarely, so I thought I could live with the approach. To do that, >>I >> obviously need to get the lag information periodically. I do that by a >> mechanism similar to what ConsumerOffsetChecker does. But now I need to >> know from which thread to call sleep() and there seems to be no decent >>way >> to find that out. >> Any better ideas would be highly appreciated. >> >> Thanks, >> Bogdan >> >> On 08/06/2014 06:25, "Jun Rao" <jun...@gmail.com> wrote: >> >> >Could you elaborate on the use case of the stream ID? >> > >> >Thanks, >> > >> >Jun >> > >> > >> >On Fri, Jun 6, 2014 at 2:13 AM, Bogdan Dimitriu (bdimitri) < >> >bdimi...@cisco.com> wrote: >> > >> >> Hello folks, >> >> >> >> I¹m using Kafka 0.8.0 with the high level consumer and I have a >> >>situation >> >> where I need to obtain the ID for each of the KafkaStreams that I >> >>create. >> >> The KafkaStream class has a method called ³clientId()² that I >>expected >> >> would give me just that, but unfortunately it returns the name of the >> >> consumer group. >> >> So to make it clear, what I want to obtain is the string that looks >>like >> >> this: myconsumergroup_myhost-1402045464004-2dc0cbf2-0. >> >> Is there any way I could get that value for each of the streams? I¹ve >> >> looked around the source code but I can¹t see any way to do this. >> >> >> >> Many thanks, >> >> Bogdan >> >> >> >> >> >>