RE: Detecting when all the retries are expired for a message

2016-12-06 Thread Mevada, Vatsal
@Asaf Do I need to raise new bug for this? @Rajini Please suggest some the configuration with which retries should work according to you. The code is already there in the mail chain. I am adding it here again: public void produce(String topicName, String filePath, String bootstrapServe

accessing state-store ala WordCount example

2016-12-06 Thread Jon Yeargers
I copied out some of the WordCountInteractive demo code to see how the REST access works. I have an aggregator groupByKey().agg

Re: Creating a connector with Kafka Connect Distributed returning 500 error

2016-12-06 Thread Konstantine Karantasis
Hi Phillip, may I ask which Kafka version did you use? trunk repo in Apache Kafka contained briefly a bug in Connect framework (during the past week) that produced failures similar to the one you describe (only in distributed mode). A fix has been pushed since yesterday. 3) Some useful step-by-s

Re: Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Hi Brian, We have 5 brokers and ~80 topics. And the total # of partitions is around 7k partitions if not including replicas (So it's close to the limit that Netflix recommends). Most topics have RF as 2. CPU is only around 25% usage. The average consumers for each topic should be around 3-4. Our d

Re: One Kafka Broker Went Rogue

2016-12-06 Thread Thomas DeVoe
Hi All, This happened again on our kafka cluster - a single kafka broker seems to "forget" the existence of the rest of the cluster and shrinks all of its ISRs to only exist on that node. The other two nodes get stuck in a loop trying to connect to this rogue node and never even register that it i

Fwd: How to disable auto commit for SimpleConsumer kafka 0.8.1

2016-12-06 Thread Anjani Gupta
I want to disable auto commit for kafka SimpleConsumer. I am using 0.8.1 version.For High level consumer, config options can be set and passed via consumerConfig as follows kafka.consumer.Consumer. createJavaConsumerConnector(this.consumerConfig); How can I achieve the same for SimpleConsumer? I m

Re: NotLeaderForPartitionException

2016-12-06 Thread Apurva Mehta
Hi Sven, You will see this exception during leader election. When the leader for a partition moves to another broker, there is a period during which the replicas would still connect to the original leader, at which point they will raise this exception. This should be a very short period, after whi

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
Here's the solution (props to Damian G) JsonSerializer keySerializer = new JsonSerializer<>(); JsonDeserializer keyDeserializer = new JsonDeserializer<>(AggKey.class); Serde keySerde = Serdes.serdeFrom(keySerializer, keyDeserializer); then for the aggregator call 'groupByKey(keySerde, prtRecordSe

Re: Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Brian Krahmer
You didn't mention anything about your current configuration, just that you are 'out of resources'. Perhaps you misunderstand how to size your partitions per topic, and how partition allocation works. If your brokers are maxed on cpu, and you double the number of brokers but keep the replica

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
Hmm. That's odd as the aggregation works ok if I use a String value for the key (and the corresponding String serde). This error only started occurring when I tried to substitute my 'custom' key for the original String. On Tue, Dec 6, 2016 at 12:24 PM, Radek Gruchalski wrote: > Yeah, I knew tha

Re: Implementing custom key serializer

2016-12-06 Thread Radek Gruchalski
Yeah, I knew that already, this part of the error: > > >>> > > org.apache.kafka.streams.processor.internals. > RecordCollector.send( > > >>> > RecordCollector.java:73) points to this line: https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/processor/interna

Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Hi Aseem, the concern is to create too many partitions in total in one cluster no matter how many brokers I have in this cluster. I think the two articles that I mentioned explain why too many partitions in one cluster could cause issues. On Tue, Dec 6, 2016 at 12:08 PM, Aseem Bansal wrote: > @Y

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
0.10.1.0 On Tue, Dec 6, 2016 at 11:11 AM, Radek Gruchalski wrote: > Jon, > > Are you using 0.10.1 or 0.10.0.1? > > – > Best regards, > Radek Gruchalski > ra...@gruchalski.com > > > On December 6, 2016 at 7:55:30 PM, Damian Guy (damian@gmail.com) > wrote: > > Hi Jon, > > At a glance the code

Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Aseem Bansal
@Yifan Ying Why not add more brokers in your cluster? That will not increase the partitions. Does increasing the number of brokers cause you any problem? How many brokers do you have in the cluster already? On Wed, Dec 7, 2016 at 12:35 AM, Yifan Ying wrote: > Thanks Asaf, Aseem. > > Assigning to

Re: kafka 0.10.1 / log-cleaner stopped / timeindex issues

2016-12-06 Thread Ismael Juma
Hi, can you please file a JIRA ticket so this doesn't get lost? Thanks, Ismael On 6 Dec 2016 5:06 pm, "Schumann,Robert" wrote: > Hi all, > > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see >

Creating a connector with Kafka Connect Distributed returning 500 error

2016-12-06 Thread Phillip Mann
I am working on migrating from Camus to Kafka Connect. I am working on the implementation of Kafka Connect and specifically focused on distributed mode. I am able to start a worker successfully on my local machine which I assume communicates with my Kafka cluster. I am further able to run two GE

Re: Implementing custom key serializer

2016-12-06 Thread Radek Gruchalski
Jon, Are you using 0.10.1 or 0.10.0.1? – Best regards, Radek Gruchalski ra...@gruchalski.com On December 6, 2016 at 7:55:30 PM, Damian Guy (damian@gmail.com) wrote: Hi Jon, At a glance the code looks ok, i.e, i believe the aggregate() should have picked up the default Serde set in your St

Re: Consumer poll - no results

2016-12-06 Thread Mohit Anchlia
I see this message in the logs: [2016-12-06 13:54:16,586] INFO [GroupCoordinator 0]: Preparing to restabilize group DemoConsumer with old generation 3 (kafka.coordinator.GroupCoordinator) On Tue, Dec 6, 2016 at 10:53 AM, Mohit Anchlia wrote: > I have a consumer polling a topic of Kafka 0.10.

Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Thanks Asaf, Aseem. Assigning topics to only a specific set of brokers will probably cause uneven traffic and it won't prevent topics to be re-assigned to other brokers when brokers fail. Like I said, the original cluster is close to out of resources. I remember there's some limit on # of partiti

Re: Implementing custom key serializer

2016-12-06 Thread Damian Guy
Hi Jon, At a glance the code looks ok, i.e, i believe the aggregate() should have picked up the default Serde set in your StreamsConfig. However, you could try adding the Serdes to the groupBy(..) i.e., rtRekey.groupByKey(new AggKeySerDe(), yourValueSerde).aggregate(...) Thanks, Damian On Tue,

Consumer poll - no results

2016-12-06 Thread Mohit Anchlia
I have a consumer polling a topic of Kafka 0.10. Even though the topic has messages the consumer poll is not fetching the message. The thread dump reveals: "main" #1 prio=5 os_prio=0 tid=0x7f3ba4008800 nid=0x798 runnable [0x7f3baa6c3000] java.lang.Thread.State: RUNNABLE at sun.n

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
It's just a bunch of public 'int' and 'String' values. There's an empty constructor and a copy constructor. For functions I override 'equals' and the requirements for 'serde' (close, configure, serializer and deserializer). @Override public Serializer serializer() { JsonSerializer j

Re: Implementing custom key serializer

2016-12-06 Thread Radek Gruchalski
Do you mind sharing the code of AggKey class? – Best regards, Radek Gruchalski ra...@gruchalski.com On December 6, 2016 at 7:26:51 PM, Jon Yeargers (jon.yearg...@cedexis.com) wrote: The 2nd. On Tue, Dec 6, 2016 at 10:01 AM, Radek Gruchalski wrote: > Is the error happening at this stage? > >

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
The 2nd. On Tue, Dec 6, 2016 at 10:01 AM, Radek Gruchalski wrote: > Is the error happening at this stage? > > KStream rtRekey = rtDetailLines.map((key, value) -> > new KeyValue<>(new AggKey(value), value)); > > or here: > > KTable, BqRtDetailLogLine_aggregate> ktRtDetail = > rtRekey.groupByKey()

Re: Best approach to frequently restarting consumer process

2016-12-06 Thread Gwen Shapira
Can you clarify what you mean by "restart"? If you call consumer.close() and consumer.subscribe() you will definitely trigger a rebalance. It doesn't matter if its "same consumer knocking", we already rebalance when you call consumer.close(). Since we want both consumer.close() and consumer.subsc

How to disable auto commit for SimpleConsumer kafka 0.8.1

2016-12-06 Thread Anjani Gupta
I want to disable auto commit for kafka SimpleConsumer. I am using 0.8.1 version.For High level consumer, config options can be set and passed via consumerConfig as follows kafka.consumer.Consumer.createJavaConsumerConnector(this.consumerConfig); How can I achieve the same for SimpleConsumer? I ma

Re: Some general questions...

2016-12-06 Thread Gwen Shapira
Yeah, that's a good point - Kafka on Windows has few quirks because most core Kafka developers are not windows experts and the big deployments are almost all on Linux. We discovered that most our .NET users actually run Kafka on Linux. Turns out that installing few VMs with Linux and running Kafka

NotLeaderForPartitionException

2016-12-06 Thread Sven Ludwig
Hello, in our Kafka clusters we sometimes observe a specific ERROR log-statement, and therefore we have doubts whether it is already running sable in our configuration. This occurs every now and then, like two or three times in a day. It is actually the predominant ERROR log-statement in our c

Re: Implementing custom key serializer

2016-12-06 Thread Radek Gruchalski
Is the error happening at this stage? KStream rtRekey = rtDetailLines.map((key, value) -> new KeyValue<>(new AggKey(value), value)); or here: KTable, BqRtDetailLogLine_aggregate> ktRtDetail = rtRekey.groupByKey().aggregate( BqRtDetailLogLine_aggregate::new, new PRTAggregate(), TimeWindows.of(60

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
If I comment out the aggregation step and just .print the .map step I don't hit the error. It's coming from aggregating the non-String key. On Tue, Dec 6, 2016 at 9:44 AM, Radek Gruchalski wrote: > Jon, > > Looking at your code: > > config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, > Serdes.Str

Re: Implementing custom key serializer

2016-12-06 Thread Radek Gruchalski
Jon, Looking at your code: config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); and later: KStream rtDetailLines = kStreamBuilder.stream(stringSerde, prtRecordSerde, TOPIC); Is RtDetailLogLine inheriting from String? It is not, as the error suggests. You ma

Re: Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
Using 0.10.1.0 This is my topology: Properties config = new Properties(); config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, BROKER_IP); config.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, ZOOKEEPER_IP); config.put(StreamsConfig.APPLICATION_ID_CONFIG, "PRTMinuteAgg" ); config.put(StreamsConfig.KEY

kafka 0.10.1 / log-cleaner stopped / timeindex issues

2016-12-06 Thread Schumann,Robert
Hi all, we are facing an issue with latest kafka 0.10.1 and the log cleaner thread with regards to the timeindex files. From the log of the log-cleaner we see after startup that it tries to cleanup a topic called xdc_listing-status-v2 [1]. The topic is setup with log compaction [2] and the kafk

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Zachary Smith
You may want to look at Secor also https://github.com/pinterest/secor On Tue, Dec 6, 2016 at 10:53 AM, noah wrote: > If you are willing to setup Kafka Connect, my company has built this > connector: https://github.com/spredfast/kafka-connect-s3 >

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Hans Jespersen
I know several people that use the qubole Kafka Sink Connector for S3 ( see https://github.com/qubole/streamx ) to store Kafka messages in S3 for long term archiving. You can also do this with the Confluent HDFS Kafka Connector if you have access to a Hadoop c

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread noah
If you are willing to setup Kafka Connect, my company has built this connector: https://github.com/spredfast/kafka-connect-s3

Best approach to frequently restarting consumer process

2016-12-06 Thread Harald Kirsch
We have consumer processes which need to restart frequently, say, every 5 minutes. We have 10 of them so we are facing two restarts every minute on average. 1) It seems that nearly every time a consumer restarts the group is rebalanced. Even if the restart takes less than the heartbeat interv

Re: Some general questions...

2016-12-06 Thread Harald Kirsch
This sounds like you might want to run the Kafka broker on Windows. Have a look at https://issues.apache.org/jira/browse/KAFKA-1194 for possible issues with regard to log cleaning. Regards, Harald. On 06.12.2016 00:50, Doyle, Keith wrote: We’re beginning to make use of Kafka, and it is enc

Re: Implementing custom key serializer

2016-12-06 Thread Damian Guy
Hi Jon, A couple of things: Which version are you using? Can you share the code you are using to the build the topology? Thanks, Damian On Tue, 6 Dec 2016 at 14:44 Jon Yeargers wrote: > Im using .map to convert my (k/v) string/Object to Object/Object but when I > chain this to an aggregation s

Implementing custom key serializer

2016-12-06 Thread Jon Yeargers
Im using .map to convert my (k/v) string/Object to Object/Object but when I chain this to an aggregation step Im getting this exception: Exception in thread "StreamThread-1" java.lang.ClassCastException: com.company.prtminuteagg.types.RtDetailLogLine cannot be cast to java.lang.String at org.apach

Re: Is CreateTopics and DeleteTopics ready for production usage?

2016-12-06 Thread Ismael Juma
As Apurva said, KIP-4 is still in progress and, as part of it, we will be to introduce an AdminClient which will then be used by the various tools. It's a large piece of work and it makes sense to merge it in stages. The protocol side is ready (KIP vote passed, code and tests were reviewed and mer

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
@Asaf Mesika Stored to S3? On Tue, Dec 6, 2016 at 5:28 PM, Asaf Mesika wrote: > We rolled our own since we couldn't (1.5 years ago) find one. The code is > quite simple and short. > > > On Tue, Dec 6, 2016 at 1:55 PM Aseem Bansal wrote: > > > I just meant that is there an existing tool which do

Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Aseem Bansal
What configurations allow you to assign topics to specific brokers? I can see https://kafka.apache.org/documentation#basic_ops_automigrate. This should allow you to move things around but does that keep anything from being re-assigned to the old ones? On Tue, Dec 6, 2016 at 5:25 PM, Asaf Mesika

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Asaf Mesika
We rolled our own since we couldn't (1.5 years ago) find one. The code is quite simple and short. On Tue, Dec 6, 2016 at 1:55 PM Aseem Bansal wrote: > I just meant that is there an existing tool which does that. Basically I > tell it "Listen to all X streams and write them to S3/HDFS at Y path

Re: Detecting when all the retries are expired for a message

2016-12-06 Thread Rajini Sivaram
I believe batches in RecordAccumulator are expired after request.timeout.ms, so they wouldn't get retried in this case. I think the config options are quite confusing, making it hard to figure out the behavior without looking into the code. On Tue, Dec 6, 2016 at 10:10 AM, Asaf Mesika wrote: > V

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
I just meant that is there an existing tool which does that. Basically I tell it "Listen to all X streams and write them to S3/HDFS at Y path as JSON". I know spark streaming can be used and there is flume. But I am not sure about their scalability/reliability. That's why I thought to initiate a di

Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Asaf Mesika
Why not re-use same cluster? You can assign topics to be live only within a specific set of brokers. Thus you have one "bus" for messages, simplifying your applications code and configurations On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying wrote: > Hi, > > Initially, we have only one Kafka cluster sh

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sudev A C
HI Aseem, You can run Apache Flume to consume messages from Kafka and write them to s3/HDFS in (micro batches)streaming fashion. Writes to s3/HDFS should be in micro batches, you can do it for every message (not every sure if s3 supports append) but it won't be performant. https://flume.apache.o

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder
What do you mean by streaming way? The logic to push to S3 will be in your consumer, so it totally depends on how you want to read and store. I think that's an easier way to do what you want to, instead of trying to backup kafka and then read messages from there. Not even sure that's possible. On

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
I get that we can read them and store them in batches but is there some streaming way? On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal wrote: > Because we need to do exploratory data analysis and machine learning. We > need to backup the messages somewhere so that the data scientists can > query/lo

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
Because we need to do exploratory data analysis and machine learning. We need to backup the messages somewhere so that the data scientists can query/load them. So we need something like a router that just opens up a new consumer group which just keeps on storing them to S3. On Tue, Dec 6, 2016 at

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder Khera
Why not just have a parallel consumer read all messages from whichever topics you're interested in and store them wherever you want to? You don't need to "backup" Kafka messages.  _ From: Aseem Bansal Sent: Tuesday, December 6, 2016 4:55 PM Subject: S

Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Aseem Bansal
Hi Has anyone done a storage of Kafka JSON messages to deep storage like S3. We are looking to back up all of our raw Kafka JSON messages for Exploration. S3, HDFS, MongoDB come to mind initially. I know that it can be stored in kafka itself but storing them in Kafka itself does not seem like a g

Re: Processing older records Kafka Consumer

2016-12-06 Thread Asaf Mesika
Seek will do the trick. Just make sure that when you run it, it only runs on partitions the current reader is assigned (call assignments() and filter only the ones assigned to you now) On Tue, Dec 6, 2016 at 12:30 PM Amit K wrote: > Sorry for not providing complete information. > > I use the aut

Re: Processing older records Kafka Consumer

2016-12-06 Thread Amit K
Sorry for not providing complete information. I use the auto-commit. Most of the other properties are more or less the default one. Actually further analysis reveled that the records are consumed by consumer but some dependent component was down (unfortunately it went completely un-detected :( ).

Re: Detecting when all the retries are expired for a message

2016-12-06 Thread Asaf Mesika
Vatsal: I don't think they merged the fix for this bug (retries doesn't work) in 0.9.x to 0.10.0.1: https://github.com/apache/kafka/pull/1547 On Tue, Dec 6, 2016 at 10:19 AM Mevada, Vatsal wrote: > Hello, > > Bumping up this thread in case anyone of you have any say on this issue. > > Regards,

Re: Processing older records Kafka Consumer

2016-12-06 Thread Asaf Mesika
Do you use auto-commit or committing your self? I'm trying to figure out how the offset moved if it was stuck. On Tue, Dec 6, 2016 at 10:28 AM Amit K wrote: > Hi, > > Is there any way to re-consume the older records from Kafka broker with > kafka consumer? > > I am using kafka 0.9.0.0 In one of

Processing older records Kafka Consumer

2016-12-06 Thread Amit K
Hi, Is there any way to re-consume the older records from Kafka broker with kafka consumer? I am using kafka 0.9.0.0 In one of the scenario, I saw records for 2 days from today were not consumed as consumer was stuck. When the consumer restarted, it started processing records from today but older

RE: Detecting when all the retries are expired for a message

2016-12-06 Thread Mevada, Vatsal
Hello, Bumping up this thread in case anyone of you have any say on this issue. Regards, Vatsal -Original Message- From: Mevada, Vatsal Sent: 02 December 2016 16:16 To: Kafka Users Subject: RE: Detecting when all the retries are expired for a message I executed the same producer code