-implementing-custom-interceptors/
https://medium.com/@bkvarda/building-a-custom-flume-interceptor-8c7a55070038
From: antonio saldivar
Date: Friday, July 20, 2018 at 9:57 AM
To: "Thakrar, Jayesh"
Cc: "users@kafka.apache.org"
Subject: Re: Measure latency from Source to Sink
Hi
A
See if you can use a custom interceptors for this.
The only fuzzy thing is that the clocks would be different so I would be a
little skeptical of its accuracy.
I have heard of some companies who have a special topic in which they insert
test msgs and then read them back - using the same machine f
While this does not answer your question, I believe during the first call, a
lot of things happen - e.g. get admin and metadata info about the cluster, etc.
That takes "some time" and hence the poll interval that is acceptable/norm for
regular processing may not be sufficient for initialization A
For more details, see https://www.slideshare.net/JayeshThakrar/kafka-68540012
While this is based on Kafka 0.9, the fundamental concepts and reasons are
still valid.
On 5/28/18, 12:20 PM, "Hans Jespersen" wrote:
Are you seeing 1) duplicate messages stored in a Kafka topic partition or
2)
ducers and consumers run at their full
potential (kind of, but not exactly async push and pull of data).
It might even be worthwhile to start off without Kafka and once you understand
things better introduce Kafka later on.
From: Matt Daum
Date: Monday, March 5, 2018 at 4:33 PM
To: "Thakr
experiments to balance between system throughput,
record size, batch size and potential batching delay for a given rate of
incoming requests.
From: Matt Daum
Date: Monday, March 5, 2018 at 1:59 PM
To: "Thakrar, Jayesh"
Cc: "users@kafka.apache.org"
Subject: Re: Kafka Setup for
, March 5, 2018 at 5:54 AM
To: "Thakrar, Jayesh"
Cc: "users@kafka.apache.org"
Subject: Re: Kafka Setup for Daily counts on wide array of keys
Thanks for the suggestions! It does look like it's using local RocksDB stores
for the state info by default. Will look into using
BTW - I did not mean to rule-out Aerospike as a possible datastore.
Its just that I am not familiar with it, but surely looks like a good candidate
to store the raw and/or aggregated data, given that it also has a Kafka Connect
module.
From: "Thakrar, Jayesh"
Date: Sunday, March 4,
nday, March 4, 2018 at 2:39 PM
To: "Thakrar, Jayesh"
Cc: "users@kafka.apache.org"
Subject: Re: Kafka Setup for Daily counts on wide array of keys
Thanks! For the counts I'd need to use a global table to make sure it's across
all the data right? Also having milli
just keep in mind that while you do your batching, kafka producer also
tries to batch msgs to Kafka, and you will need to ensure you have enough
buffer memory. However that's all configurable.
Finally ensure you have the latest java updates and have kafka 0.10.2 or highe
Matt,
If I understand correctly, you have an 8 node Kafka cluster and need to support
about 1 million requests/sec into the cluster from source servers and expect
to consume that for aggregation.
How big are your msgs?
I would suggest looking into batching multiple requests per single Kafka m
Can you also check if you have partition leaders flapping or changing rapidly?
Also, look at the following settings on your client configs:
max.partition.fetch.bytes
fetch.max.bytes
receive.buffer.bytes
We had a similar situation in our environment when the brokers were flooded
with data.
The sy
can enable client debug logs to check any errors.
On Mon, Oct 30, 2017 at 7:25 AM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:
> I created a new Kafka topic with 1 partition and then sent 10 messages
> using the KafkaProducer API using the async callback
hanks,
Jayesh
Just to make it clear Haitao, in your case you do not have to restart brokers
(since you are changing at the topic level).
On 8/6/17, 11:37 PM, "Kaufman Ng" wrote:
Hi Haitao,
The retention time (retention.ms) configuration can exist as a broker-level
and/or topic-level config.
You may want to look at the Kafka REST API instead of having so many direct
client connections.
https://github.com/confluentinc/kafka-rest
On 7/31/17, 1:29 AM, "Dr. Sven Abels" wrote:
Hi guys,
does anyone have an idea about the possible limits of concurrent users?
-
Hi Dmitri,
This presentation might help you understand and take appropriate actions to
deal with data duplication (and data loss)
https://www.slideshare.net/JayeshThakrar/kafka-68540012
Regards,
Jayesh
On 4/13/17, 10:05 AM, "Vincent Dautremont"
wrote:
One of the case where
Have a look at the Cluster which has a "topic" method to get a set of all the
topics.
https://kafka.apache.org/0100/javadoc/org/apache/kafka/common/Cluster.html
In version 8/9, there was also the ZKUtils, but the desire is to have clients
not to interrogate ZK directly.
On 10/24/16, 4:32 PM, "
below?
From: R Krishna
To: users@kafka.apache.org; Jayesh Thakrar
Sent: Tuesday, August 30, 2016 2:02 AM
Subject: Re: Question: Data Loss and Data Duplication in Kafka
Experimenting with kafka myself, and found timeouts/batch expiry (valid and
invalid configurations), and
a loss as described in KAFKA-3924 (resolved in 0.10.0.1).
And data duplication can attributed primarily to consumer offset management
which is done at batch/periodic intervals.
Can anyone think or know of any other scenarios?
Thanks,Jayesh
Hello,
I have a very basic doubt.
I created a kafka topic and produced 10 messages using the
kafka-console-producer utility. When I consume messages from this topic, it
consumes 10 messages - fine. However, it shows that I have processed a
total of 11 messages. This number is +1 the total number
What are the producers/consumers for the Kafka cluster?
Remember that its not just files but also sockets that add to the count.
I had seen issues when we had a network switch problem and had Storm consumers.
The switch would cause issues in connectivity between Kafka brokers, zookeepers
and clie
Checkout the Consumer API
http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
and search for the method "seekToEnd"
Here's the "text" from the API Doc -
seekToEnd
public void seekToEnd(Collection partitions)
Seek to the last offset for each of
My guess is that say your broker went down and you restarted it.
That time interval between shutdown/crash and the restart was shorter than the
ZK node's ephemeral timeout value.
Once that time is over, your node disappears from Zookeeper, the broker is able
to recreate the znode and hence the s
fetch.size in the consumer config accordingly.
On Thu, Jun 2, 2016 at 3:41 PM, Thakrar, Jayesh < jthak...@conversantmedia.com>
wrote:
> Wondering if anyone has encountered similar issues.
>
> Using Kafka 0.8.2.1.
>
> Occasionally, we encounter a situation in which a consum
Wondering if anyone has encountered similar issues.
Using Kafka 0.8.2.1.
Occasionally, we encounter a situation in which a consumer (including
kafka-console-consumer.sh) just hangs.
If I increment the offset to skip the offending message, things work fine again.
I have been able to identify the
tions that result in total storage for the partition to be
around 20-50 GB.
There are a couple of good articles out there on Kafka cluster design - e.g.
http://morebigdata.blogspot.com/2015/10/tips-for-successful-kafka-deployments.html
Hope that helps.
Jayesh
-Original Message--
Hello Everyone,
I have a load of ~10k messages/sec. As the load increases, I see a burst of
following error in Kafka (before everything starts working fine again):
*Error*: ERROR kafka.server.ReplicaManager: [Replica Manager on Broker 22]:
Error processing append operation on partition _topic_name
cases one needs to
take care of - e.g. scaling up by 5% v/s scaling up by 50% in say, a 20 node
cluster.
Furthermore, to be really effective, one needs to be cognizant of the partition
sizes, and with rack-awareness, the task becomes even more involved.
Regards,
Jayesh
-Original Message
Thanks Gwen - yes, I agree - let me work on it, make it available on github and
then I guess we can go from there.
Thanks,
Jayesh
-Original Message-
From: Gwen Shapira [mailto:g...@confluent.io]
Sent: Wednesday, May 11, 2016 12:26 PM
To: d...@kafka.apache.org; Jayesh Thakrar
Cc
cluster.
Thank you,Jayesh Thakrar
error show below.This is a windows laptop with 4 GB
memory and about 3 GB free RAM.I have also tried increasing the heap size
(-Xmx) to 1500m, but that did not help too.
I am sure this is something very basic, but can't seem to be able to figure
out.Thanks,Jayesh
C:\Users\jthakrar\Jayesh\Downlo
Hi,
I am trying to write a consumer using the KafkaConsumer class from
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java.
My code is pretty simple with the snippet show below.However what I am seeing
is that I am not seeing any
Another option is to copy data from each topic (of interest/concern) to a "flat
file on a periodic basis".E.g. say you had a queue that only contained "textual
data".Periodically I would run the bundled console-consumer to read data from
the queue and dump to a file/directory and then backup it
Does one also need to set the config parameter "delete.topic.enable" to true ?I
am using 8.2 beta and I had to set it to true to enable topic deletion.
From: Armando Martinez Briones
To: users@kafka.apache.org
Sent: Wednesday, January 14, 2015 11:33 AM
Subject: Re: Delete topic
than
I do see the Windows based scripts in the tar file - but haven't them
though.You should find them under bin/windows.
Also you can always use other Windows stress testing tools/suites to check your
local I/O performance..
From: Shlomi Hazan
To: users@kafka.apache.org; Jayesh Th
r the wiki page I was referring to -
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
From: Jayesh Thakrar
To: "users@kafka.apache.org"
Sent: Tuesday, January 6, 2015 11:09 AM
Subject: Zookeeper Connection When Using High Level Sample Consumer Code
When I try running the Java Consumer example at
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+ExampleI get
the following zookeeper connection error.
I have verified zookeeper connectivity using a variety fo means (using
Zookeeper built-in client, sending 4-letter commands to
Have you tried using the built-in stress test scripts?
bin/kafka-producer-perf-test.sh
bin/kafka-consumer-perf-test.sh
Here's how I stress tested them -
nohup ${KAFKA_HOME}/bin/kafka-producer-perf-test.sh --broker-list
${KAFKA_SERVERS} --topic ${TOPIC_NAME} --new-producer --threads 16 --messages
Just wondering Mukesh - the reason you want this feature is because your value
payload is not small (tens of kb). Don't know if that is the right usage of
kafka. It might be worthwhile to store the avro files in a filesystem (regular,
cluster fs, hdfs or even hbase) and the value in your kafka m
storage, I don't think it should be an issue with sufficient spindles,
servers and higher than default memory configuration.
Jayesh
From: Achanta Vamsi Subhash
To: "users@kafka.apache.org"
Sent: Friday, December 19, 2014 9:00 AM
Subject: Re: Max. storage for Kafka and imp
Some more things to think about:What is the data volume you are dealing with?Do
you need to have multiple partitions to support the data/throughput?Are you
looking at each partition to be dedicated to a single user or a group of
users?Is the data balanced across all your users or is it skewed?Ho
42 matches
Mail list logo