Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
Hi Frank, As far as I know the design in that wiki has been superceded by the Global KTables design which is now coming in 0.10.2. Hence, the JIRAs that are mentioned there (like KAFKA-3705). There are some extensive comments in https://issues.apache.org/jira/browse/KAFKA-3705

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-21 Thread Jorge Esteban Quilcate Otoya
Thanks for the feedback Matthias. * 1. You're right. I'll reorder the scenarios. * 2. Agree. I'll update the KIP. * 3. I like it, updating to `reset-offsets` * 4. Agree, removing the `reset-` part * 5. Yes, 1.e option without --execute or --export will print out current offset, and the new off

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Frank Lyaruu
I've read that JIRA (although I don't understand every single thing), and I got the feeling it is not exactly the same problem. I am aware of the Global Tables, and I've tried that first, but I seem unable to do what I need to do. I'm replicating a relational database, and on a one-to-many relatio

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Hi, yes the ticket is exactly about what you want to do. The lengthy discussion is mainly about what the key of the output KTable is. @gouzhang would you be interested in seeing what we did so far? best Jan On 21.02.2017 13:10, Frank Lyaruu wrote: I've read that JIRA (although I don't under

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Just a little note here: if you can take all rows of the "children" table for each key into memory, you get get away by using group_by and make a list of them. With this aggregation the join is straight forward and you can use a lateral view later to get to the same result. For this you could

Heartbeats while consuming a message in kafka-python

2017-02-21 Thread Martin Sucha
Hello, I'm processing messages using kafka-python from a single topic, but ocassionally I have a message that might take a while to process, so I'd like to send heartbeats while processing the message. My program works something like this: consumer = KafkaConsumer(...) for message in consumer:

hitting the throughput limit on a cluster?

2017-02-21 Thread Jon Yeargers
Running 3x 8core on google compute. Topic has 16 partitions (replication factor 2) and is consumed by 16 docker containers on individual hosts. System seems to max out at around 4 messages / minute. Each message is ~12K - compressed (snappy) JSON. Recently moved from 12 to the above 16 parti

Kafka colapsed due to OutOfMemory

2017-02-21 Thread Fernando Bugni
Hi, I have a 3 node kafka in production with 1gb in each JVM. Suddenly one day, two of them when down and the other tried to make all the work. The log errors were that they could not sincronize. We applied rebalance to them and it never ends. Unfortunately, we have not GC log, so we had to add it

Re: JMX metrics for replica lag time

2017-02-21 Thread Guozhang Wang
You can find them in https://kafka.apache.org/documentation/#monitoring I think this is the one you are looking for: Lag in messages per follower replica kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) lag should be proportional to the ma

Re: hitting the throughput limit on a cluster?

2017-02-21 Thread Todd Palino
So I think the important thing to look at here is the IO wait on your system. You’re hitting disk throughput issues, and that’s what you most likely need to resolve. So just from what you’ve described, I think the only thing that is going to get you more performance is more spindles (or faster spin

Re: Heartbeats while consuming a message in kafka-python

2017-02-21 Thread Guozhang Wang
Hi Martin, Since 0.10.1 KIP-62 has been added to consumer client, so that the user does not need to manually call pause / resume. https://cwiki.apache.org/confluence/display/KAFKA/KIP- 62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread As for python client, as far as I know this ba

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Guozhang Wang
Jan, Sure I would love to hear what you did for non-key joins. Last time we chatted there are discussions on the ordering issue, that we HAVE TO augment the join result stream keys as a combo of both, which may not be elegant as used in the DSL. For your proposed solution, it seems you did not do

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
+1 on seeing what Jan did, I'm interested too. Thanks Eno > On 21 Feb 2017, at 19:15, Guozhang Wang wrote: > > Jan, > > Sure I would love to hear what you did for non-key joins. Last time we > chatted there are discussions on the ordering issue, that we HAVE TO > augment the join result stream

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Jan Filipiak
Hi, yeah if the proposed solution is doable (only constrain really is to not have a parent key with lots of children) completly in the DSL except the lateral view wich is a pretty easy thing in PAPI. Our own implementation is a mix of reusing DSL interfaces but using reflection against KTabl

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Guozhang Wang
Thanks for sharing Jan. I think it would help if you are share a sketch of your code snippet for illustrating the implementation. As for the recent development, assuming you are referring to KIP-120 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API),

Re: Heartbeats while consuming a message in kafka-python

2017-02-21 Thread Jeff Widman
As far as I understood it, the primary thrust of KIP-62 was making it so heartbeats could be issued outside of the poll() loop, meaning that the session.timeout.ms could be reduced below the length of time it takes a consumer to process a particular batch of messages. Unfortunately, while both lib

Re: hitting the throughput limit on a cluster?

2017-02-21 Thread Jon Yeargers
Thanks for looking at this issue. I checked the max IOPs for this disk and we're only at about 10%. I can add more disks to spread out the work. What IOWait values should I be aiming for? Also - what do you set openfiles to? I have it at 65535 but I just read a doc that suggested > 100K is better

Re: Heartbeats while consuming a message in kafka-python

2017-02-21 Thread Guozhang Wang
Hello Jeff, You are right that currently kafka-python does not expose two configs, i.e. session.timeout.ms and max.poll.timeout.ms as the Java client does, but I think the current setting may be sufficient to tackle Martin's issue alone as long as session.timeout.ms can be tuned to be large enough

KTable send old values API

2017-02-21 Thread Dmitry Minkovsky
At KAFKA-2984: ktable sends old values when required , @ymatsuda writes: > NOTE: This is meant to be used by aggregation. But, if there is a use case like a SQL database trigger, we can add a new KTable method to expose this. Looking throu

Re: Heartbeats while consuming a message in kafka-python

2017-02-21 Thread Jeff Widman
Yes, in the linked issues in my email it shows both projects plan to add support for it. The problem with a high session.timeout.ms is if you've got a message that is locking your consumer in a perpetual processing cycle, then the consumer will timeout of the group w/o rejoining. So then another c

Re: KTable send old values API

2017-02-21 Thread Eno Thereska
Hi Dmitry, Could you tell us more on the exact API you'd like? Perhaps if others find it useful too we/you can do a KIP. Thanks Eno > On 21 Feb 2017, at 22:01, Dmitry Minkovsky wrote: > > At KAFKA-2984: ktable sends old values when required >

Re: KTable send old values API

2017-02-21 Thread Dmitry Minkovsky
Hi Eno, Thank you. I don't think I'm advanced enough to imagine a good API. But I can elaborate my use-cases further. So say I have two tables: KTable left = topology.table(stringSerde, stringSerde, topicLeft, topicLeft); KTable right = topology.table(stringSerde, stringSerde, to

Kafka with SSL (question about certificate management)

2017-02-21 Thread Raghav
Hi We have a Kafka Client (Producer) which periodically generates some stuff and pushes to a Kafka Broker, that is out of our network. We want to use secure Kafka using SSL. I read the http://kafka.apache.org/documentation/#security. in which it explains about the Kafka with SSL. I have a questi

Re: JMX metrics for replica lag time

2017-02-21 Thread Jun MA
Hi Guozhang, Thanks for pointing this out. I was actually looking at this before and that’s why I’m asking the question. This metric is 'lag in messages', and since now the ISR logic relies on lag in seconds, not lag in messages, I’m not sure how useful this metrics is. In fact, we saw the valu

How to stop Kafka Mirror Maker

2017-02-21 Thread Qian Zhu
Hi, For now, I am doing “kill –9 processID” to stop the Kafka Mirror Maker. I am wondering whether there is a better way (e.g. a command) to do so? I don’t expect to stop the mirror maker frequently but I would like to have a script to automate the start and stop. Thanks a lot! Qian Zhu

Consumption on a explicitly (dynamically) created topic has a 5 minute delay

2017-02-21 Thread Jaikiran Pai
We are on Kafka 0.10.0.1 (server and client) and use Java consumer/producer APIs. We have an application where we create Kafka topics dynamically (using the AdminUtils Java API) and then start producing/consuming on those topics. The issue we frequently run into is this: 1. Application proces