Kafka consumer configuration / performance issues

2016-10-04 Thread Shamik Banerjee
Hi, I'm a newbie trying out kafka as an alternative to AWS SQS. The motivation primarily is to improve performance where kafka would eliminate the constraint of pulling 10 messages at a time with a cap of 256kb. Here's a high-level scenario of my use case. I've a bunch of crawlers which are

KafkaConsumer poll poor performance (0.10.0.0)

2016-10-04 Thread Mudassir Maredia
I am using KafkaConsumer from 0.10.0.0 version. Below is the sample code. I have tried to optimize it to get to a maximum number but that number is really very small. At times I have seen 80,000 records processed per second but at times it shows only 10,000 records per second. I have tried

Re: Snazzy new look to our website

2016-10-04 Thread Jason Gustafson
Huge improvement. Thanks Derrick and Gwen! On Tue, Oct 4, 2016 at 5:54 PM, Becket Qin wrote: > Much fancier now :) > > On Tue, Oct 4, 2016 at 5:51 PM, Ali Akhtar wrote: > > > Just noticed this on pulling up the documentation. Oh yeah! This new look >

Re: Snazzy new look to our website

2016-10-04 Thread Ali Akhtar
Just noticed this on pulling up the documentation. Oh yeah! This new look is fantastic. On Wed, Oct 5, 2016 at 4:31 AM, Vahid S Hashemian wrote: > +1 > > Thank you for the much needed new design. > At first glance, it looks great, and more professional. > > --Vahid >

Applying "single queue multi server" semantics to a Kafka topic

2016-10-04 Thread Ron Crocker
TL;DR: Any experience with overlaying a “single queue multi server” facade onto a Kafka topic? (I’m new to the list and I tried searching for an answer prior to spamming everyone with this idea - sorry if I missed the answer in that search —ron) I'm curious if anyone has run into (and solved)

Re: Snazzy new look to our website

2016-10-04 Thread Vahid S Hashemian
+1 Thank you for the much needed new design. At first glance, it looks great, and more professional. --Vahid From: Gwen Shapira To: d...@kafka.apache.org, Users Cc: Derrick Or Date: 10/04/2016 04:13 PM Subject:

Re: Kafka Streams dynamic partitioning

2016-10-04 Thread Guozhang Wang
By default the partitioner will use murmur hash on the key and mode on current num.partitions to determine which partitions to go to, so records with the same key will be assigned to the same partition. Would that be OK for your case? Guozhang On Tue, Oct 4, 2016 at 3:00 PM, Adrienne Kole

Re: [VOTE] 0.10.1.0 RC0

2016-10-04 Thread Jason Gustafson
One clarification: this is a minor release, not a major one. -Jason On Tue, Oct 4, 2016 at 4:01 PM, Jason Gustafson wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for release of Apache Kafka 0.10.1.0. This is > a major release

Snazzy new look to our website

2016-10-04 Thread Gwen Shapira
Hi Team Kafka, I just merged PR 20 to our website - which gives it a new (and IMO pretty snazzy) look and feel. Thanks to Derrick Or for contributing the update. I had to do a hard-refresh (shift-f5 on my mac) to get the new look to load properly - so if stuff looks off, try it. Comments and

[VOTE] 0.10.1.0 RC0

2016-10-04 Thread Jason Gustafson
Hello Kafka users, developers and client-developers, This is the first candidate for release of Apache Kafka 0.10.1.0. This is a major release that includes great new features including throttled replication, secure quotas, time-based log searching, and queryable state for Kafka Streams. A full

Re: Kafka Streams - Parallel by default or 1 thread per topic?

2016-10-04 Thread Ali Akhtar
<3 On Wed, Oct 5, 2016 at 2:31 AM, Ali Akhtar wrote: > That's awesome. Thanks. > > On Wed, Oct 5, 2016 at 2:19 AM, Matthias J. Sax > wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA512 >> >> Yes. >> >> On 10/4/16 1:47 PM, Ali Akhtar

Re: kafka stream to new topic based on message key

2016-10-04 Thread Guozhang Wang
Hello Gary, This is also doable in the Processor API, you can use the record collector from ProcessorContext to send data to arbitrary topics, i.e.: RecordCollector collector = ((RecordCollector.Supplier) context).recordCollector(); collector.send(new ProducerRecord<>(topic, *...*),

Re: kafka streams with dynamic content and filtering

2016-10-04 Thread Guozhang Wang
Hello Gary, What you described should be workable with the lower-level Processor interface of Kafka Streams, i.e. dynamic aggregations based on the input data indicating changes to the JSON schemas. For detailed examples of how the Processor API works please read the corresponding sections on the

Re: Kafka streams Processor life cycle behavior of close()

2016-10-04 Thread Guozhang Wang
Created https://issues.apache.org/jira/browse/KAFKA-4253 for this issue. Guozhang On Tue, Oct 4, 2016 at 3:08 PM, Guozhang Wang wrote: > Hello Srikanth, > > We close the underlying clients before closing the state manager (hence > the states) because for example we need to

Re: Kafka streams Processor life cycle behavior of close()

2016-10-04 Thread Guozhang Wang
Hello Srikanth, We close the underlying clients before closing the state manager (hence the states) because for example we need to make sure producer's sent records have all been acked before the state manager records the changelog sent offsets as end offsets. This is kind of chicken-and-egg

Kafka Streams dynamic partitioning

2016-10-04 Thread Adrienne Kole
Hi, >From Streams documentation, I can see that each Streams instance is processing data independently (from other instances), reads from topic partition(s) and writes to specified topic. So here, the partitions of topic should be determined beforehand and should remain static. In my usecase I

Re: Kafka Streams - Parallel by default or 1 thread per topic?

2016-10-04 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Yes. On 10/4/16 1:47 PM, Ali Akhtar wrote: > Hey Matthias, > > All my topics have 3 partitions each, and I will have about 20-30 > topics in total that need to be subscribed to and managed. > > So, if I create an app which registers handles for

Re: Kafka Streams - Parallel by default or 1 thread per topic?

2016-10-04 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Kafka Stream parallelizes via Kafka partitions -- for each partitions a task is created. If you subscribe to multiple topics, the topics with the most partitions determine the number of task, and each task get partitions from all topics assigned.

Re: Kafka Streams - Parallel by default or 1 thread per topic?

2016-10-04 Thread Ali Akhtar
Hey Matthias, All my topics have 3 partitions each, and I will have about 20-30 topics in total that need to be subscribed to and managed. So, if I create an app which registers handles for each of the 30 topics, the parallelization / multithreading will be handled behind the scenes by kafka

Kafka Streams - Parallel by default or 1 thread per topic?

2016-10-04 Thread Ali Akhtar
I need to consume a large number of topics, and handle each topic in a different way. I was thinking about creating a different KStream for each topic, and doing KStream.foreach for each stream, to process incoming messages. However, its unclear if this will be handled in a parallel way by

RE: Delete Consumer Group Information

2016-10-04 Thread Krieg, David
Hi Nick, First, I have the same question about removing offset groups as you do. I've seen answers like "you don't need to" on StackOverflow, but that's obviously not helpful. So I'll hope that part gets picked up by another user on the group. But what I can help answer is about offset

kafka stream to new topic based on message key

2016-10-04 Thread Gary Ogden
Is it possible, in a kafka streaming job, to write to another topic based on the key in the messages? For example, say the message is: 123456#{"id":56789,"type":1} where the key is 123456, # is the delimeter, and the {} is the json data. And I want to push the json data to another topic that

Re: Delete Consumer Group Information

2016-10-04 Thread Vahid S Hashemian
Please take a look at this answer to a similar recent question: http://mail-archives.apache.org/mod_mbox/kafka-users/201608.mbox/%3cCAHwHRrUV=M_2T_XjGwkwgZ=3ba+adogro1_ckvvbytarhag...@mail.gmail.com%3e The config parameter mentioned in the post for expiration of committed offsets is

Delete Consumer Group Information

2016-10-04 Thread Cuneo, Nicholas
We are using Kafka .9 and are playing around with the consumer group feature. We have a lot of junk and stale consumer group information in the consumer groups and want to get rid of it. What's the best way to do that? Using Kafka Tool, I see that all the consumer groups are stored in 'Kafka'

Kafka on EBS st1?

2016-10-04 Thread Dave Mangot
Has anyone been running kafka well on the st1 EBS volumes? We've historically run on m1 and m2 instance types for our Kafka workload but wanted to move to the M4s to get better price/performance. We rolled out a single instance in two environments with M4 and 1 TB of st1. Everything seemed to

Re: Restrict who can change ACLs

2016-10-04 Thread Gerard Klijs
You could limit the access to zookeeper, with kerberos, or with a firewall. For example to only allow connections to zookeeper from the cluster itself, this way you need to access those machines to be able to set acls. The create permission is used for creating topics I think, there is no acl to

Restrict who can change ACLs

2016-10-04 Thread Shrikant Patel
Hi All, How can I restrict who can modify ACLs for kafka cluster? Anyone can use kafka-acls cli to modify the acl. I added superuser and thought that when we are running the kafka-acls, it validates that only spatel user can run this command. So what prevents user on n\w trying to modify

Re: New Idea for Kafka multiple consumers running parallel.

2016-10-04 Thread Ali Akhtar
You may be able to control the starting offset, but if you try to control which instance gets offset 4.. you'll lose all benefits of parallelism. On 4 Oct 2016 3:02 pm, "Kaushil Rambhia/ MUM/CORP/ ENGINEERING" < kaushi...@pepperfry.com> wrote: > Hi guys, > i am using apache kafka with phprd

New Idea for Kafka multiple consumers running parallel.

2016-10-04 Thread Kaushil Rambhia/ MUM/CORP/ ENGINEERING
Hi guys, i am using apache kafka with phprd kafka, i want to know how can i use multiple Kafka consumers on same partition from different groups to consume message parallel, say if consumer are c1,c2,c3 consuming single partition 0, than if c1 is consuming from 0 offset than c2 should start from 1

Re: Kafka 0.10.1 ProcessorTopologyTestDriver and WindowedStreamPartitioner issue

2016-10-04 Thread Hamidreza Afzali
Thanks Guozhang. I can confirm the issue is resolved. Hamid

Can't send json array as configuration

2016-10-04 Thread dhanuka ranasinghe
Hi All, When trying to reconfigure kafka standalone connector I am getting below Exception. Could you please help on this? Caused by: com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of java.lang.String out of START_ARRAY token Error payload: