Re: Kafka Streams stopped with errors, failed to reinitialize itself

2017-05-03 Thread Sameer Kumar
My brokers are on version 10.1.0 and my clients are on version 10.2.0. Also, do a reply to all, I am currently not subscribed to the mailing list. -Sameer. On Wed, May 3, 2017 at 5:27 PM, Sameer Kumar wrote: > Hi, > > > > I want to report an issue where in addition of a

Re: Kafka Streams Failed to rebalance error

2017-05-03 Thread Sameer Kumar
My brokers are on version 10.1.0 and my clients are on version 10.2.0. Also, do a reply to all, I am currently not subscribed to the list. -Sameer. On Wed, May 3, 2017 at 6:34 PM, Sameer Kumar wrote: > Hi, > > > > I ran two nodes in my streams compute cluster, they were

Shouldn't the initializer of a stream aggregate accept the key?

2017-05-03 Thread João Peixoto
Looking at the aggregate documentation one of the required items is an "initializer", no arguments and returns a value. Shouldn't this initializer follow a similar approach of Java's computIfAbsent

Re: Resetting offsets

2017-05-03 Thread Dana Powers
Requires stopping your existing consumers, but otherwise should work: from kafka import KafkaConsumer, TopicPartition, OffsetAndMetadata def reset_offsets(group_id, topic, bootstrap_servers): consumer = KafkaConsumer(bootstrap_servers=bootstrap_servers, group_id=group_id)

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
Another question that I have is, is there a way for us detect how many messages have come out of order? And if possible, what is the delay? On Thu, May 4, 2017 at 6:17 AM, Mahendra Kariya wrote: > Hi Matthias, > > Sure we will look into this. In the meantime, we have

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
Hi Matthias, Sure we will look into this. In the meantime, we have run into another issue. We have started getting this error frequently rather frequently and the Streams app is unable to recover from this. 07:44:08.493 [StreamThread-10] INFO o.a.k.c.c.i.AbstractCoordinator - Discovered

Re: Kafka Streams stopped with errors, failed to reinitialize itself

2017-05-03 Thread Matthias J. Sax
What Kafka Streams version are you using? We had some bounce issues that got fixed in 0.10.2.1 (that was released last week). -Matthias On 5/3/17 4:57 AM, Sameer Kumar wrote: > Hi, > > > > I want to report an issue where in addition of a server at runtime in my > streams compute cluster

Re: joining two windowed aggregations

2017-05-03 Thread Matthias J. Sax
> Seems like this would be a standard join operation Not sure, if I would consider this a "standard" join... Your windows have different size and thus "move" differently fast. Kafka Stream joins provide sliding join semantics. Similar to a SQL query like this (conceptually): > SELECT * FROM

Re: Kafka-Streams Offsets Commit - Reprocessing Very Old Messages

2017-05-03 Thread Matthias J. Sax
Hard to say. Couple of things you could try: upgrade to 0.10.2.1 (got released last week) -- it contains a couple of bug fixed with regard to rebalancing and state store locks. Also, when you application "jumps back", is this somewhere in the middle of your input topic or is it "earliest" -- if

Re: disable KTable caching?

2017-05-03 Thread Matthias J. Sax
You can disable caching by setting cache size to zero. http://docs.confluent.io/current/streams/developer-guide.html#memory-management -Matthias On 5/3/17 4:07 PM, Steven Schlansker wrote: > I'm designing a Streams application that provides an API that acts > on messages. Messages have a

disable KTable caching?

2017-05-03 Thread Steven Schlansker
I'm designing a Streams application that provides an API that acts on messages. Messages have a sender. I have a KStream and a KTable The first time a message is sent, you need to ensure the sender exists beforehand. Roughly, void send(Message m) { if

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Matthias J. Sax
I would recommend to double check the following: - can you confirm that the filter does not remove all data for those time periods? - I would also check input for your AggregatorFunction() -- does it receive everything? - same for .mapValues() This would help to understand in what part of the

Kafka-Streams Offsets Commit - Reprocessing Very Old Messages

2017-05-03 Thread Khatri, Alpa
Hi, We are using apache kafka-streams 0.10.2.0 in an application. We are leveraging kafka-streams topology for passing the processed data on to the next topic till the end of processing. Also, We use AWS ECS container to deploy Consumer Application. We observed consumer is picking up very old

Re: AW: Consumer with another group.id conflicts with streams()

2017-05-03 Thread Matthias J. Sax
I just confirmed. `KafkaConsumer.close()` should be idempotent. It's a bug in the consumer. https://issues.apache.org/jira/browse/KAFKA-5169 -Matthias On 5/3/17 2:20 PM, Matthias J. Sax wrote: > Yes, Streams might call "close" multiple times, as we assume it's an > idempotent operations. >

Re: AW: Consumer with another group.id conflicts with streams()

2017-05-03 Thread Matthias J. Sax
Yes, Streams might call "close" multiple times, as we assume it's an idempotent operations. Thus, if you close "your" Consumer in `DebugTransformer.close()`, the operation is not idempotent anymore, and thus fails. `KafkaConsumer.close()` is not idempotent :( You can just use a "try-catch" when

Re: Windowed aggregations memory requirements

2017-05-03 Thread Eno Thereska
This is a timely question and we've updated the documentation here on capacity planning and sizing for Kafka Streams jobs: http://docs.confluent.io/current/streams/sizing.html . Any feedback welcome. It has scenarios with windowed stores

Trevor Grant has shared a document on Google Docs with you

2017-05-03 Thread trevor . d . grant
Trevor Grant has invited you to view the following document: Open in Docs

Re: Windowed aggregations memory requirements

2017-05-03 Thread Garrett Barton
That depends on if your using event, processing or ingestion time. My understanding is that if you play a record through that is T-6, the only way that TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5)) would actually process that record in your window is if your

joining two windowed aggregations

2017-05-03 Thread Jon Yeargers
I want to collect data in two windowed groups - 4 hours with a one hour overlap and a 5 minute / 1 minute. I want to compare the values in the _oldest_ window for each group. Seems like this would be a standard join operation but Im not clear on how to limit which window the join operates on. I

AW: Consumer with another group.id conflicts with streams()

2017-05-03 Thread Andreas Voss
Hi again, One more observation: The problem only occurs when the two application instances are started one after the other with some delay in-between (7 seconds in my test setup). So the first instance already started to process the events that were in the queue, when the second instance came

Re: Resetting offsets

2017-05-03 Thread Ben Stopford
Hu is correct, there isn't anything currently, but there is an active proposal: https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling On Wed, May 3, 2017 at 1:23 PM Hu Xi wrote: > Seems there is no command line out of box, but

Re: Kafka Streams Failed to rebalance error

2017-05-03 Thread Eno Thereska
Hi, Which version of Kafka are you using? This should be fixed in 0.10.2.1, any chance you could try that release? Thanks Eno > On 3 May 2017, at 14:04, Sameer Kumar wrote: > > Hi, > > > I ran two nodes in my streams compute cluster, they were running fine for few

Re: Kafka Stream stops polling new messages

2017-05-03 Thread João Peixoto
That'd be great as I'm not familiar with the protocol there On Wed, May 3, 2017 at 8:41 AM Eno Thereska wrote: > Cool, thanks, shall we open a JIRA? > > Eno > > On 3 May 2017, at 16:16, João Peixoto wrote: > > > > Actually I need to apologize, I

Re: Kafka Stream stops polling new messages

2017-05-03 Thread Eno Thereska
Cool, thanks, shall we open a JIRA? Eno > On 3 May 2017, at 16:16, João Peixoto wrote: > > Actually I need to apologize, I pasted the wrong issue, I meant to paste > https://github.com/facebook/rocksdb/issues/261. > > RocksDB did not produce a crash report since it

Re: Kafka Stream stops polling new messages

2017-05-03 Thread João Peixoto
Actually I need to apologize, I pasted the wrong issue, I meant to paste https://github.com/facebook/rocksdb/issues/261. RocksDB did not produce a crash report since it didn't actually crash. I performed thread dumps on stale and not-stale instances which revealed the common behavior and I

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Damian Guy
If you use the versions of the methods that pass in the store name they will all be backed by RocksDB On Wed, 3 May 2017 at 15:32 Garrett Barton wrote: > João, yes the stores would hold 90 days and prefer it to be rocksdb backed. > > I didn't actually know there wasn't

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Garrett Barton
João, yes the stores would hold 90 days and prefer it to be rocksdb backed. I didn't actually know there wasn't an in memory state store. And now that I think about it, how do I verify (or set) what kind of store streams is using for all the tasks? I have a bunch windowed and not windowed and I

RE: Does queue.time still apply for the new Producer?

2017-05-03 Thread Petr Novak
I have tried it with new and old producer. I sent 2 messages to Kafka topic in sequence on program start. queue.time seems to have no effect in both. The new producer sends them immediately even with queue.time and batch.size set very high. It blocks on linger.ms as expected. Interesting

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Eno Thereska
Just to add to this, there is a JIRA that tracks the fact that we don’t have an in-memory windowed store. https://issues.apache.org/jira/browse/KAFKA-4730 Eno > On May 3, 2017, at 12:42 PM, Damian Guy wrote: > > The

Re: Kafka Stream stops polling new messages

2017-05-03 Thread Eno Thereska
Hi there, Thanks for double checking. Does RocksDB actually crash or produce a crash dump? I’m curious how you know that the issue is https://github.com/facebook/rocksdb/issues/1121 , so just double checking with you. If that’s indeed the

答复: Resetting offsets

2017-05-03 Thread Hu Xi
Seems there is no command line out of box, but if you could write a simple Java client application that firstly calls 'seek' or 'seekToBeginning' to reset offsets to what you expect and then invoke commitSync to commit the offsets. 发件人: Paul van der Linden

Re: why is it called kafka?

2017-05-03 Thread Martin Gainty
an intelligent, creative and versatile writer who spoke 3 languages, had a background in chemistry and invented the hardhat I wonder how Kafkas stance on giving voice to individuals standing against obtuse bureaucracies would fare today ? Something to consider Martin

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Damian Guy
The windowed state store is only RocksDB at this point, so it isn't going to all be in memory. If you chose to implement your own Windowed Store, then you could hold it in memory if it would fit. On Wed, 3 May 2017 at 04:37 João Peixoto wrote: > Out of curiosity, would

Resetting offsets

2017-05-03 Thread Paul van der Linden
I'm trying to reset the offsets for all partitions for all topics for a consumer group, but I can't seem to find a working way. The command line tool provides a tool to remove a consumer group (which would be fine in this occasion), but this is not working with the "new" style consumer groups. I

Windowed aggregations memory requirements

2017-05-03 Thread João Peixoto
The base question I'm trying to answer is "how much memory does my instance need". Considering a use case where I want to keep a rolling average on a tumbling window of 1 minute size allowing for late arrivals up to 5 minutes (lower bound) we would have something like this:

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
Hi Garrett, Thanks for these insights. But we are not consuming old data. We want the Streams app to run in near real time. And that is how it is actually running. The lag never increases beyond a certain limit. So I don't think that's an issue. The values of the configs that you are mentioning