Re: Ensuring that the message is persisted after acknowledgement

2021-08-20 Thread Eric Azama
Rather than forcing writes to disk after each message, the usual method of ensuring durability is to configure the topic with a Replication Factor of 3, min.insync.replicas=2 and have producers configure acks=all. This ensures that the record has been replicated to at least 2 brokers before an ack

Re: Kafka metrics to calculate number of messages in a topic

2021-08-18 Thread Eric Azama
That will get you a good approximation, but it's not guaranteed to be completely accurate. Offsets in Kafka are not guaranteed to be continuous. For topics with log compaction enabled, the removed records will leave (potentially very large) holes in the offsets. Even for topics without log compac

Re: Trouble understanding tuning batching config

2020-03-20 Thread Eric Azama
Hi Ryan, If your end goal is just larger files on the server, you don't really need to mess with the batching configs. You could just write multiple polls worth of data to a single file. On Fri, Mar 20, 2020 at 3:50 PM Liam Clarke wrote: > Hi Ryan, > > That'll be per poll. > > Kind regards, >

Re: Kafka Broker leader change without effect

2020-01-16 Thread Eric Azama
offsets.topic.replication.factor is set to 1, so the consumer is likely failing because some of the __consumer_offsets topic partitions are offline. On Thu, Jan 16, 2020 at 9:05 AM JOHN, BIBIN wrote: > Producer request will not fail. Producer will fail based on acks and > min.insync.replicas con

Re: Consumer hangs at poll() and never throw exception on client

2019-12-15 Thread Eric Azama
Consumers are unable to make any progress if the next records to be sent are part of an open transaction. Are your producers properly closing transactions? On Sun, Dec 15, 2019 at 7:49 PM Frank Zhou wrote: > More details, the first application's consumer has been closed and its > producers are r

Re: What is the node-id tag in kafka producer MetricName ?

2019-12-03 Thread Eric Azama
Nodes with a negative id refer to the bootstrap servers you configured the client with. There are also metrics that report for nodes with an extremely large node id. These are usually Integer.MAX_VALUE - (coordinator node id) On Tue, Dec 3, 2019 at 1:37 PM Rajkumar Natarajan wrote: > I've have

Re: Attempt to prove Kafka transactions work

2019-11-20 Thread Eric Azama
Calls to KafkaConsumer#poll() are completely independent of commits. As such they will always return the next set of records, even if the previous set have not been committed. This is how the consumer acts, regardless of the Exactly Once semantics. In order for the Consumer to reset to the current

Re: [External] Allow parallel processing

2019-11-18 Thread Eric Azama
I second Dave's suggestion. With regards to the consumers round-robining between topics, they usually round-robin in batches. So you'll probably see a consumer work on a large batch of records from TopicA before moving on to TopicB. Depending on the behavior of the producers this might appear the

Re: Kafka Broker do not recover after crash

2019-11-18 Thread Eric Azama
Hi Oliver, Your first line of log has a timestamp of 19:15:42 and the last few logs show that the container received a SIGTERM at 19:16:10. That looks suspiciously close to 30 seconds after kubernetes initiated the pod. Does your deployment have a timeout that terminates a container if it's not re

Re: poor producing performance with very low CPU utilization?

2019-10-03 Thread Eric Azama
Hi Eric, You've given hardware information about your brokers, but I don't think you've provided information about the machine your producer is running on. Have you verified that you're not reaching any caps on your producer's machine? I also think you might be hitting the limit of what a single

Re: Re: Re: Doubts in Kafka

2019-01-14 Thread Eric Azama
The mapping of ProducerRecord keys to TopicPartitions is configurable within the producer. As such, you wouldn't be able to ask the Cluster for a mapping since it doesn't know how the mapping was generated. To determine the TopicPartition a given key would map to, you will need to run it through t

Re: Programmatic method of setting consumer groups offsets

2019-01-03 Thread Eric Azama
Adding on to Ryanne's point, if subscribe() isn't giving your consumer all of the partitions for a topic, that implies there are still active consumers running for that group. The consumer groups CLI command does not allow you to modify offsets for consumer groups that have active consumers. I beli

Re: kafka.consumers Java Package

2018-11-09 Thread Eric Azama
The kafka.consumer package was the old scala consumer. The upgrade notes indicate that they were deprecated in 0.11 and removed in 2.0 On Fri, Nov 9, 2018 at 8:00 AM Harsha Chintalapani wrote: > Chris, > You are upgrading from 0.10.2.2 t

Re: Performance Impact with Apache Kafka Security

2018-08-24 Thread Eric Azama
I saw a similar 30-40% performance hit when testing a move from Plaintext to SASL_SSL. It seemed to be due to the additional traffic generated by replication between brokers. Enabling SSL only between the client and brokers and leaving inter-broker traffic on Plaintext only introduced ~10% perform

Re: Maintain "consumer has all the messages for a key" delivery guarantee with two topics?

2018-08-24 Thread Eric Azama
The default partition assignment for the consumer should guarantee this as long as the number of partitions in each topic is the same. The term used for this in the Streams doc is co-partitioning. The link below is referring specifically to the KStreams Join method, but the concept is similar to w

Re: how to detect kafka protocol

2018-08-15 Thread Eric Azama
Zookeeper definitely has the information about endpoints and protocol. The /brokers/ids/ paths in zookeeper contains the endpoints that are open on the broker. I have doubts that you'll be able to make this change without any downtime though. To my knowledge, clients are only capable of using one

Re: reg Kafka Node num

2017-10-20 Thread Eric Azama
Kafka logs traffic to the Controller node separately from the normal traffic to the node. In order to differentiate it subtracts the broker id from 2147483647 (max int) and uses the result as the "node id" for the controller. On a related note, logs and metrics related to the bootstrap process see