Re: Debugging Kafka Streams Windowing

2017-06-08 Thread Mahendra Kariya
ncreasing those two properties do the trick? I am running into this > exact issue testing streams out on a single Kafka instance. Yet I can > manually start a consumer and read the topics fine while its busy doing > this dead stuffs. > > On Tue, May 23, 2017 at 12:30 AM, Mahendra Kar

Re: Debugging Kafka Streams Windowing

2017-05-22 Thread Mahendra Kariya
On 22 May 2017 at 16:09, Guozhang Wang wrote: > For > that issue I'd suspect that there is a network issue, or maybe the network > is just saturated already and the heartbeat request / response were not > exchanged in time between the consumer and the broker, or the sockets being > dropped becaus

Re: Debugging Kafka Streams Windowing

2017-05-16 Thread Mahendra Kariya
over so that any consumer > groups that is corresponding to that offset topic partition won't be > blocked. > > > Guozhang > > > > On Mon, May 15, 2017 at 7:33 PM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: > > > Thanks for the rep

Re: Debugging Kafka Streams Windowing

2017-05-15 Thread Mahendra Kariya
this exception. On Mon, May 15, 2017 at 11:28 PM, Guozhang Wang wrote: > I'm wondering if it is possibly due to KAFKA-5167? In that case, the "other > thread" will keep retrying on grabbing the lock. > > Guozhang > > > On Sat, May 13, 2017 at 7:30 PM, Mahendra K

Re: Order of punctuate() and process() in a stream processor

2017-05-14 Thread Mahendra Kariya
We use Kafka Streams for quite a few aggregation tasks. For instance, counting the number of messages with a particular status in a 1-minute time window. We have noticed that whenever we restart a stream, we see a sudden spike in the aggregated numbers. After a few minutes, things are back to norm

Re: Debugging Kafka Streams Windowing

2017-05-13 Thread Mahendra Kariya
atform/A14dkPlDlv4 > > Do you still see missing data? > > > -Matthias > > > On 5/11/17 2:39 AM, Mahendra Kariya wrote: > > Hi Matthias, > > > > We faced the issue again. The logs are below. > > > > 16:13:16.527 [StreamThread-7] INFO o.a.k.c.

Re: Debugging Kafka Streams Windowing

2017-05-11 Thread Mahendra Kariya
o.a.k.c.c.i.AbstractCoordinator - Discovered coordinator broker-05:6667 (id: 2147483642 rack: null) for group grp_id. On Tue, May 9, 2017 at 3:40 AM, Matthias J. Sax wrote: > Great! Glad 0.10.2.1 fixes it for you! > > -Matthias > > On 5/7/17 8:57 PM, Mahendra Kariya wrote: &

Re: Debugging Kafka Streams Windowing

2017-05-07 Thread Mahendra Kariya
and what the delay was. > > > >>> - same for .mapValues() > >>> > >> > >> I am not sure how to check this. > > The same way as you do for filter()? > > > -Matthias > > > On 5/4/17 10:29 AM, Mahendra Kariya wrote: > > Hi

Re: Debugging Kafka Streams Windowing

2017-05-04 Thread Mahendra Kariya
Hi Matthias, Please find the answers below. I would recommend to double check the following: > > - can you confirm that the filter does not remove all data for those > time periods? > Filter does not remove all data. There is a lot of data coming in even after the filter stage. > - I would a

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
Another question that I have is, is there a way for us detect how many messages have come out of order? And if possible, what is the delay? On Thu, May 4, 2017 at 6:17 AM, Mahendra Kariya wrote: > Hi Matthias, > > Sure we will look into this. In the meantime, we have run into anothe

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
hat part of the program the data gets > lost. > > > -Matthias > > > On 5/2/17 11:09 PM, Mahendra Kariya wrote: > > Hi Garrett, > > > > Thanks for these insights. But we are not consuming old data. We want the > > Streams app to run in near real time. And

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Mahendra Kariya
ta. I don't have a good fix yet, I set the retention to > something massive which I think is getting me other problems. > > Maybe that helps? > > On Tue, May 2, 2017 at 6:27 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com> > wrote: > > > Hi Matthias, > > &

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Mahendra Kariya
ta is present is 1493694300. That's around 9 minutes of data missing. And this is just one instance. There are a lot of such instances in this file. On Sun, Apr 30, 2017 at 11:23 AM, Mahendra Kariya < mahendra.kar...@go-jek.com> wrote: > Thanks for the update Matthias! And sorry for t

Re: Debugging Kafka Streams Windowing

2017-04-29 Thread Mahendra Kariya
tead of .count() ? > > > > Also, can you share the code of you AggregatorFunction()? Did you change > > any default setting of StreamsConfig? > > > > I have still no idea what could go wrong. Maybe you can run with log > > level TRACE? Maybe we can get som

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
> is increasing, you should actually see multiple records per window. > > Your code is like this: > > stream.filter().groupByKey().count(TimeWindow.of(6)).to(); > > Or do you have something more complex? > > > -Matthias > > > On 4/27/17 9:16 PM, Mahendra Kari

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
> Can you somehow verify your output? Do you mean the Kafka streams output? In the Kafka Streams output, we do see some missing values. I have attached the Kafka Streams output (for a few hours) in the very first email of this thread for reference. Let me also summarise what we have done so far.

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
-1 > > The metric you mentioned, reports the number of skipped record. > > Are you sure that `getEventTimestamp()` never returns -1 ? > > > > -Matthias > > On 4/27/17 10:33 AM, Mahendra Kariya wrote: > > Hey Eno, > > > > We are using a custom TimeStampEx

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
he.org/jira/browse/KAFKA-5055>. Could you let us know > if you use any special TimeStampExtractor class, or if it is the default? > > Thanks > Eno > > On 27 Apr 2017, at 13:46, Mahendra Kariya > wrote: > > > > Hey All, > > > > We have a Kafka Strea

Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Hey All, We have a Kafka Streams application which ingests from a topic to which more than 15K messages are generated per second. The app filters a few of them, counts the number of unique filtered messages (based on one particular field) within a 1 min time window, and dumps it back to Kafka. Th

Re: how can I contribute to this project?

2017-04-19 Thread Mahendra Kariya
Hi James, This page has all the information you are looking for. https://kafka.apache.org/contributing On Thu, Apr 20, 2017 at 9:32 AM, James Chain wrote: > Hi > Because I love this project, so I want to take part of it. But I'm brand > new to opensource project. > > How can I get started to ma

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-18 Thread Mahendra Kariya
> I see the java.lang.NoSuchMethodError: org.apache.kafka.clients... error. > Looks like some jars aren't in the classpath? > > Eno > > > On 18 Apr 2017, at 12:46, Mahendra Kariya > wrote: > > > > Hey Eno, > > > > I just pulled the latest jar from the link you

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-18 Thread Mahendra Kariya
ractConfig. getConfiguredInstances(AbstractConfig.java:220) at org.apache.kafka.clients.consumer.KafkaConsumer.( KafkaConsumer.java:673) ... 6 more On Tue, Apr 18, 2017 at 5:47 AM, Mahendra Kariya wrote: > Thanks! > > On Tue, Apr 18, 2017, 12:26 AM Eno Thereska > wrote: > >>

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
Thanks! On Tue, Apr 18, 2017, 12:26 AM Eno Thereska wrote: > The RC candidate build is here: > http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/ < > http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/> > > Eno > > On 17 Apr 2017, at 17:20, Mahendra Kari

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
Thanks! In the meantime, is the jar published somewhere on github or as a part of build pipeline? On Mon, Apr 17, 2017 at 9:18 PM, Eno Thereska wrote: > Not yet, but as soon as 0.10.2 is voted it should be. Hopefully this week. > > Eno > > On 17 Apr 2017, at 13:25, Mahendra

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
Are the bug fix releases published to Maven central repo? On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska wrote: > Hi Sachin, > > In the bug fix release for 0.10.2 (and in trunk) we have now set > max.poll.interval to infinite since from our experience with streams this > should not be something t

Re: auto.offset.reset for Kafka streams 0.10.2.0

2017-04-11 Thread Mahendra Kariya
> cf. > > https://github.com/apache/kafka/blob/0.10.2.0/streams/ > > src/main/java/org/apache/kafka/streams/StreamsConfig.java#L405 > > > > > > -Matthias > > > > On 4/10/17 9:41 PM, Mahendra Kariya wrote: > > > This was even my assumption. But I h

Re: auto.offset.reset for Kafka streams 0.10.2.0

2017-04-10 Thread Mahendra Kariya
from the offset. > > > On Tue, Apr 11, 2017 at 8:51 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: > > > Hey All, > > > > Is the auto offset reset set to "earliest" by default in Kafka streams > > 0.10.2.0? I thought default wa

auto.offset.reset for Kafka streams 0.10.2.0

2017-04-10 Thread Mahendra Kariya
Hey All, Is the auto offset reset set to "earliest" by default in Kafka streams 0.10.2.0? I thought default was "latest". I started a new Kafka streams application with a fresh application id and it started consuming messages from the beginning.

Re: Capacity planning for Kafka Streams

2017-03-22 Thread Mahendra Kariya
-platform/wgCSuwIJo5g On Wed, Mar 22, 2017 at 7:11 PM, Damian Guy wrote: > Hi Mahendra, > > Are you able to share the complete logs? It is pretty hard to tell what is > happening just from a few snippets of information. > > Thanks, > Damian > > On Wed, 22 Mar 2017 at 12:

Re: Capacity planning for Kafka Streams

2017-03-22 Thread Mahendra Kariya
On Sat, Mar 18, 2017 at 5:58 AM, Mahendra Kariya wrote: > Thanks for the heads up Guozhang! > > The problem is our brokers are on 0.10.0.x. So we will have to upgrade > them. > > On Sat, Mar 18, 2017 at 12:30 AM, Guozhang Wang > wrote: > >> Hi Mahendra, >

Re: Consumers not rebalancing after replication

2017-03-21 Thread Mahendra Kariya
, Mahendra Kariya wrote: > Hey All, > > We have six consumers in a consumer group. At times, some of the > partitions are under replicated for a while (maybe, 2 mis). During this > time, the consumers subscribed to such partitions stops getting data from > Kafka and they become in

Consumers not rebalancing after replication

2017-03-21 Thread Mahendra Kariya
Hey All, We have six consumers in a consumer group. At times, some of the partitions are under replicated for a while (maybe, 2 mis). During this time, the consumers subscribed to such partitions stops getting data from Kafka and they become inactive for a while. But when the partitions are fully

Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
WAL files <https://github.com/facebook/rocksdb/wiki/basic-operations#purging-wal-files>. But I am not sure how to set these configs. Any help would be really appreciated. Just for reference, our Kafka brokers are on v0.10.0.1 and RocksDB version is 4.8.0. On Mon, Mar 20, 2017 at 12:29 PM,

Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
Hey Guozhang, Thanks a lot for these insights. We are facing the exact same problem as Tianji. Our commit frequency is also quite high. We flush almost around 16K messages per minute to Kafka at the end of the topology. Another issue that we are facing is that rocksdb is not deleting old data. We

Re: Capacity planning for Kafka Streams

2017-03-17 Thread Mahendra Kariya
gt; > > On Mon, Mar 13, 2017 at 5:12 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: > > > We are planning to migrate to the newer version of Kafka. But that's a > few > > weeks away. > > > > We will try setting the socket config an

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Mahendra Kariya
o be large, especially when running in AWS with > // high latency. if running locally the default is fine. > props.put(ProducerConfig.SEND_BUFFER_CONFIG, 1024 * 1024); > props.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 1024 * 1024); > > Make sure the OS allows the larger socket size too

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Mahendra Kariya
Hi Eno, Please find my answers inline. We are in the process of documenting capacity planning for streams, stay > tuned. > This would be great! Looking forward to it. Could you send some more info on your problem? What Kafka version are you > using? > We are using Kafka 0.10.0.0. > Are the

Capacity planning for Kafka Streams

2017-03-12 Thread Mahendra Kariya
Hey All, Are there some guidelines / documentation around capacity planning for Kafka streams? We have a Streams application which consumes messages from a topic with 400 partitions. At peak time, there are around 20K messages coming into that topic per second. The Streams app consumes these mess

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
luent-platform > > Thanks > Eno > > On 27 Feb 2017, at 13:37, Mahendra Kariya > wrote: > > > > Hi, > > > > I have a couple of questions regarding Kafka streams. > > > > 1. Can we merge two streams from two different Kafka clusters? > > 2. Can my sink topic be in Kafka cluster different from source topic? > > > > Thanks! > >

Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
Hi, I have a couple of questions regarding Kafka streams. 1. Can we merge two streams from two different Kafka clusters? 2. Can my sink topic be in Kafka cluster different from source topic? Thanks!

Re: Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-23 Thread Mahendra Kariya
+1 for such a tool. It would be of great help in a lot of use cases. On Thu, Feb 23, 2017 at 11:44 PM, Matthias J. Sax wrote: > \cc from dev > > > Forwarded Message > Subject: Re: KIP-122: Add a tool to Reset Consumer Group Offsets > Date: Thu, 23 Feb 2017 10:13:39 -0800 > From

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Mahendra Kariya
cords have been completely > processed in the topology, and that also means that the actual commit > internal may be a bit longer than the configured value in practice. > > > Guozhang > > > > On Wed, Feb 22, 2017 at 8:15 PM, Mahendra Kariya < > mahendra.kar...@go-jek.

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Mahendra Kariya
Hi Guozhang, On Thu, Feb 23, 2017 at 2:48 AM, Guozhang Wang wrote: > With that even if you do > not have any data processed the commit operation will be triggered after > that configured period of time. > The above statement is confusing. As per this thread , offsets are o

Re: JMX metrics for replica lag time

2017-02-22 Thread Mahendra Kariya
Just wondering, for what particular Kafka version is this applicable? On Thu, Feb 23, 2017 at 2:38 AM, Guozhang Wang wrote: > Hmm that is a very good question. It seems to me that we did not add the > corresponding metrics for it when we changed the mechanism. And your > observation is likely to

Re: Kafka consumer offset location

2017-02-09 Thread Mahendra Kariya
You can use the seekToBeginning method of KafkaConsumer. https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning(java.util.Collection) On Thu, Feb 9, 2017 at 7:56 PM, Igor Kuzmenko wrote: > Hello, I'm using new consumer to read kafka topic. For

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Mahendra Kariya
Ah OK! Thanks! On Mon, Feb 6, 2017, 3:09 PM Eno Thereska wrote: > Oh, by "other" I meant the original one you started discussing: > COMMIT_INTERVAL_MS_CONFIG. > > Eno > > On 6 Feb 2017, at 09:28, Mahendra Kariya > wrote: > > > > Thanks Eno! > >

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Mahendra Kariya
false, > and manage commits using the other commit parameter. That way streams has > more control on when offsets are committed. > > Eno > > On 6 Feb 2017, at 05:39, Mahendra Kariya > wrote: > > > > I have another follow up question regarding configuration. > &g

Re: At Least Once semantics for Kafka Streams

2017-02-05 Thread Mahendra Kariya
tps://kafka.apache.org/documentation/#consumerconfigs> apply to Kafka streams as well? On Fri, Feb 3, 2017 at 9:43 PM, Mahendra Kariya wrote: > Ah OK! Thanks a lot for this clarification. > > it will only commit the offsets if the value of COMMIT_INTERVAL_MS_CONFIG >> has >> passed. >> > > > > >

Re: At Least Once semantics for Kafka Streams

2017-02-03 Thread Mahendra Kariya
Ah OK! Thanks a lot for this clarification. it will only commit the offsets if the value of COMMIT_INTERVAL_MS_CONFIG > has > passed. >

Re: At Least Once semantics for Kafka Streams

2017-02-03 Thread Mahendra Kariya
Thanks Damian for this info. On Fri, Feb 3, 2017 at 3:29 PM, Damian Guy wrote: > The commit is done on the same thread as the processing, so only offsets > that have been fully processed by the topology will be committed. > I am still not clear about why do we need the COMMIT_INTERVAL_MS_CONFI

Re: At Least Once semantics for Kafka Streams

2017-02-03 Thread Mahendra Kariya
try from last committed offset (and because some > output date might already be written which cannot be undone you might > get duplicates). > > -Matthias > > On 1/29/17 8:13 PM, Mahendra Kariya wrote: > > Hey All, > > > > I am new to Kafka streams. From the document

At Least Once semantics for Kafka Streams

2017-01-29 Thread Mahendra Kariya
Hey All, I am new to Kafka streams. From the documentation , it is pretty much clear that streams support at least once semantics. But I couldn't find details about how this is supported. I am interested in knowing the

Re: Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
nnects Kafka no longer has the offset > so it will reprocess from earliest. > > Michael > > > On 12 Jan 2017, at 11:13, Mahendra Kariya > wrote: > > > > Hey All, > > > > We have a Kafka cluster hosted on Google Cloud. There was some network > > iss

Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
Hey All, We have a Kafka cluster hosted on Google Cloud. There was some network issue on the cloud and suddenly, the offset for a particular consumer group got reset to earliest and all of a sudden the lag was in millions. We aren't able to figure out what went wrong. Has anybody faced the same/si