Re: Measure latency from Source to Sink

2018-07-20 Thread Thakrar, Jayesh
-implementing-custom-interceptors/ https://medium.com/@bkvarda/building-a-custom-flume-interceptor-8c7a55070038 From: antonio saldivar Date: Friday, July 20, 2018 at 9:57 AM To: "Thakrar, Jayesh" Cc: "users@kafka.apache.org" Subject: Re: Measure latency from Source to Sink Hi A

Re: Measure latency from Source to Sink

2018-07-20 Thread Thakrar, Jayesh
See if you can use a custom interceptors for this. The only fuzzy thing is that the clocks would be different so I would be a little skeptical of its accuracy. I have heard of some companies who have a special topic in which they insert test msgs and then read them back - using the same machine f

Re: If timeout is short, the first poll does not return records

2018-07-18 Thread Thakrar, Jayesh
While this does not answer your question, I believe during the first call, a lot of things happen - e.g. get admin and metadata info about the cluster, etc. That takes "some time" and hence the poll interval that is acceptable/norm for regular processing may not be sufficient for initialization A

Re: Facing Duplication Issue in kakfa

2018-05-29 Thread Thakrar, Jayesh
For more details, see https://www.slideshare.net/JayeshThakrar/kafka-68540012 While this is based on Kafka 0.9, the fundamental concepts and reasons are still valid. On 5/28/18, 12:20 PM, "Hans Jespersen" wrote: Are you seeing 1) duplicate messages stored in a Kafka topic partition or 2)

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-05 Thread Thakrar, Jayesh
ducers and consumers run at their full potential (kind of, but not exactly async push and pull of data). It might even be worthwhile to start off without Kafka and once you understand things better introduce Kafka later on. From: Matt Daum Date: Monday, March 5, 2018 at 4:33 PM To: "Thakr

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-05 Thread Thakrar, Jayesh
experiments to balance between system throughput, record size, batch size and potential batching delay for a given rate of incoming requests. From: Matt Daum Date: Monday, March 5, 2018 at 1:59 PM To: "Thakrar, Jayesh" Cc: "users@kafka.apache.org" Subject: Re: Kafka Setup for

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-05 Thread Thakrar, Jayesh
, March 5, 2018 at 5:54 AM To: "Thakrar, Jayesh" Cc: "users@kafka.apache.org" Subject: Re: Kafka Setup for Daily counts on wide array of keys Thanks for the suggestions! It does look like it's using local RocksDB stores for the state info by default. Will look into using

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
BTW - I did not mean to rule-out Aerospike as a possible datastore. Its just that I am not familiar with it, but surely looks like a good candidate to store the raw and/or aggregated data, given that it also has a Kafka Connect module. From: "Thakrar, Jayesh" Date: Sunday, March 4,

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
nday, March 4, 2018 at 2:39 PM To: "Thakrar, Jayesh" Cc: "users@kafka.apache.org" Subject: Re: Kafka Setup for Daily counts on wide array of keys Thanks! For the counts I'd need to use a global table to make sure it's across all the data right? Also having milli

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
just keep in mind that while you do your batching, kafka producer also tries to batch msgs to Kafka, and you will need to ensure you have enough buffer memory. However that's all configurable. Finally ensure you have the latest java updates and have kafka 0.10.2 or highe

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-03 Thread Thakrar, Jayesh
Matt, If I understand correctly, you have an 8 node Kafka cluster and need to support about 1 million requests/sec into the cluster from source servers and expect to consume that for aggregation. How big are your msgs? I would suggest looking into batching multiple requests per single Kafka m

Re: Lost messages and messed up offsets

2017-11-30 Thread Thakrar, Jayesh
Can you also check if you have partition leaders flapping or changing rapidly? Also, look at the following settings on your client configs: max.partition.fetch.bytes fetch.max.bytes receive.buffer.bytes We had a similar situation in our environment when the brokers were flooded with data. The sy

Re: Offsets in Kafka producer start with -1 for new topic

2017-10-30 Thread Thakrar, Jayesh
can enable client debug logs to check any errors. On Mon, Oct 30, 2017 at 7:25 AM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > I created a new Kafka topic with 1 partition and then sent 10 messages > using the KafkaProducer API using the async callback

Offsets in Kafka producer start with -1 for new topic

2017-10-29 Thread Thakrar, Jayesh
hanks, Jayesh

Re: do i need to restart the brokers if I changed the retention time for a specific topic

2017-08-07 Thread Thakrar, Jayesh
Just to make it clear Haitao, in your case you do not have to restart brokers (since you are changing at the topic level). On 8/6/17, 11:37 PM, "Kaufman Ng" wrote: Hi Haitao, The retention time (retention.ms) configuration can exist as a broker-level and/or topic-level config.

Re: Limit of simultaneous consumers/clients?

2017-07-31 Thread Thakrar, Jayesh
You may want to look at the Kafka REST API instead of having so many direct client connections. https://github.com/confluentinc/kafka-rest On 7/31/17, 1:29 AM, "Dr. Sven Abels" wrote: Hi guys, does anyone have an idea about the possible limits of concurrent users? -

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Thakrar, Jayesh
Hi Dmitri, This presentation might help you understand and take appropriate actions to deal with data duplication (and data loss) https://www.slideshare.net/JayeshThakrar/kafka-68540012 Regards, Jayesh On 4/13/17, 10:05 AM, "Vincent Dautremont" wrote: One of the case where

Re: programmatic way to check for topic existence?

2016-10-25 Thread Thakrar, Jayesh
Have a look at the Cluster which has a "topic" method to get a set of all the topics. https://kafka.apache.org/0100/javadoc/org/apache/kafka/common/Cluster.html In version 8/9, there was also the ZKUtils, but the desire is to have clients not to interrogate ZK directly. On 10/24/16, 4:32 PM, "

Re: Question: Data Loss and Data Duplication in Kafka

2016-09-05 Thread Jayesh Thakrar
below? From: R Krishna To: users@kafka.apache.org; Jayesh Thakrar Sent: Tuesday, August 30, 2016 2:02 AM Subject: Re: Question: Data Loss and Data Duplication in Kafka Experimenting with kafka myself, and found timeouts/batch expiry (valid and invalid configurations), and

Question: Data Loss and Data Duplication in Kafka

2016-08-28 Thread Jayesh Thakrar
a loss as described in KAFKA-3924 (resolved in 0.10.0.1). And data duplication can attributed primarily to consumer offset management which is done at batch/periodic intervals. Can anyone think or know of any other scenarios? Thanks,Jayesh

Mismatch in the number of messages processed

2016-08-16 Thread Jayesh
Hello, I have a very basic doubt. I created a kafka topic and produced 10 messages using the kafka-console-producer utility. When I consume messages from this topic, it consumes 10 messages - fine. However, it shows that I have processed a total of 11 messages. This number is +1 the total number

Re: Too Many Open Files

2016-08-01 Thread Thakrar, Jayesh
What are the producers/consumers for the Kafka cluster? Remember that its not just files but also sockets that add to the count. I had seen issues when we had a network switch problem and had Storm consumers. The switch would cause issues in connectivity between Kafka brokers, zookeepers and clie

RE: Last offset in all partitions

2016-07-06 Thread Thakrar, Jayesh
Checkout the Consumer API http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html and search for the method "seekToEnd" Here's the "text" from the API Doc - seekToEnd public void seekToEnd(Collection partitions) Seek to the last offset for each of

RE: Intermittent runtime exception: broker already registered

2016-06-17 Thread Thakrar, Jayesh
My guess is that say your broker went down and you restarted it. That time interval between shutdown/crash and the restart was shorter than the ZK node's ephemeral timeout value. Once that time is over, your node disappears from Zookeeper, the broker is able to recreate the znode and hence the s

RE: Problematic messages in Kafka

2016-06-02 Thread Thakrar, Jayesh
fetch.size in the consumer config accordingly. On Thu, Jun 2, 2016 at 3:41 PM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > Wondering if anyone has encountered similar issues. > > Using Kafka 0.8.2.1. > > Occasionally, we encounter a situation in which a consum

Problematic messages in Kafka

2016-06-02 Thread Thakrar, Jayesh
Wondering if anyone has encountered similar issues. Using Kafka 0.8.2.1. Occasionally, we encounter a situation in which a consumer (including kafka-console-consumer.sh) just hangs. If I increment the offset to skip the offending message, things work fine again. I have been able to identify the

RE: Rebalancing issue while Kafka scaling

2016-06-01 Thread Thakrar, Jayesh
tions that result in total storage for the partition to be around 20-50 GB. There are a couple of good articles out there on Kafka cluster design - e.g. http://morebigdata.blogspot.com/2015/10/tips-for-successful-kafka-deployments.html Hope that helps. Jayesh -Original Message--

Get "Error processing append operation on partition" in bursts (non deterministically)

2016-05-31 Thread Jayesh
Hello Everyone, I have a load of ~10k messages/sec. As the load increases, I see a burst of following error in Kafka (before everything starts working fine again): *Error*: ERROR kafka.server.ReplicaManager: [Replica Manager on Broker 22]: Error processing append operation on partition _topic_name

RE: Rebalancing issue while Kafka scaling

2016-05-31 Thread Thakrar, Jayesh
cases one needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in say, a 20 node cluster. Furthermore, to be really effective, one needs to be cognizant of the partition sizes, and with rack-awareness, the task becomes even more involved. Regards, Jayesh -Original Message

RE: [DISCUSS] KIP-59 - Proposal for a kafka broker command - kafka-brokers.sh

2016-05-11 Thread Thakrar, Jayesh
Thanks Gwen - yes, I agree - let me work on it, make it available on github and then I guess we can go from there. Thanks, Jayesh -Original Message- From: Gwen Shapira [mailto:g...@confluent.io] Sent: Wednesday, May 11, 2016 12:26 PM To: d...@kafka.apache.org; Jayesh Thakrar Cc

[DISCUSS] KIP-59 - Proposal for a kafka broker command - kafka-brokers.sh

2016-05-09 Thread Jayesh Thakrar
cluster. Thank you,Jayesh Thakrar

Newbie Developer question

2015-06-04 Thread Jayesh Thakrar
error show below.This is a windows laptop with 4 GB memory and about 3 GB free RAM.I have also tried increasing the heap size (-Xmx) to 1500m, but that did not help too. I am sure this is something very basic, but can't seem to be able to figure out.Thanks,Jayesh C:\Users\jthakrar\Jayesh\Downlo

KafkaConsumer Class Usage in Kafka 0.8.2 Beta

2015-02-12 Thread Jayesh Thakrar
Hi,  I am trying to write a consumer using the KafkaConsumer class from  https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java. My code is pretty simple with the snippet show below.However what I am seeing is that I am not seeing any

Re: Backups

2015-01-20 Thread Jayesh Thakrar
Another option is to copy data from each topic (of interest/concern) to a "flat file on a periodic basis".E.g. say you had a queue that only contained "textual data".Periodically I would run the bundled console-consumer to read data from the queue and dump to a file/directory and then backup it

Re: Delete topic

2015-01-14 Thread Jayesh Thakrar
Does one also need to set the config parameter "delete.topic.enable" to true ?I am using 8.2 beta and I had to set it to true to enable topic deletion. From: Armando Martinez Briones To: users@kafka.apache.org Sent: Wednesday, January 14, 2015 11:33 AM Subject: Re: Delete topic than

Re: latency - how to reduce?

2015-01-08 Thread Jayesh Thakrar
I do see the Windows based scripts in the tar file - but haven't them though.You should find them under bin/windows. Also you can always use other Windows stress testing tools/suites to check your local I/O performance.. From: Shlomi Hazan To: users@kafka.apache.org; Jayesh Th

Re: Zookeeper Connection When Using High Level Sample Consumer Code from Wiki

2015-01-06 Thread Jayesh Thakrar
r the wiki page I was referring to - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example From: Jayesh Thakrar To: "users@kafka.apache.org" Sent: Tuesday, January 6, 2015 11:09 AM Subject: Zookeeper Connection When Using High Level Sample Consumer Code

Zookeeper Connection When Using High Level Sample Consumer Code from Wiki

2015-01-06 Thread Jayesh Thakrar
When I try running the Java Consumer example at  https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+ExampleI get the following zookeeper connection error. I have verified zookeeper connectivity using a  variety fo means (using Zookeeper built-in client, sending 4-letter commands to

Re: latency - how to reduce?

2015-01-06 Thread Jayesh Thakrar
Have you tried using the built-in stress test scripts? bin/kafka-producer-perf-test.sh bin/kafka-consumer-perf-test.sh Here's how I stress tested them - nohup ${KAFKA_HOME}/bin/kafka-producer-perf-test.sh --broker-list ${KAFKA_SERVERS} --topic ${TOPIC_NAME} --new-producer --threads 16 --messages

Re: Kafka getMetadata api

2015-01-02 Thread Jayesh Thakrar
Just wondering Mukesh - the reason you want this feature is because your value payload is not small (tens of kb). Don't know if that is the right usage of kafka. It might be worthwhile to store the avro files in a filesystem (regular, cluster fs, hdfs or even hbase) and the value in your kafka m

Re: Max. storage for Kafka and impact

2014-12-19 Thread Jayesh Thakrar
storage, I don't think it should be an issue with sufficient spindles, servers and higher than default memory configuration. Jayesh From: Achanta Vamsi Subhash To: "users@kafka.apache.org" Sent: Friday, December 19, 2014 9:00 AM Subject: Re: Max. storage for Kafka and imp

Re: Kafka design pattern question - multiple user ids

2014-12-15 Thread Jayesh Thakrar
Some more things to think about:What is the data volume you are dealing with?Do you need to have multiple partitions to support the data/throughput?Are you looking at each partition to be dedicated to a single user or a group of users?Is the data balanced across all your users or is it skewed?Ho