Re: Consuming an entire partition with control messages

2023-07-27 Thread Vincent Maurin
nsumer (ie, the Java one), does the exact same thing. -Matthias On 5/30/23 8:41 AM, Vincent Maurin wrote: Hello ! I am working on an exactly once stream processors in Python, using aiokafka client library. My program stores a state in memory, that is recovered from a changelog topic, like in kaf

Consuming an entire partition with control messages

2023-05-30 Thread Vincent Maurin
Hello ! I am working on an exactly once stream processors in Python, using aiokafka client library. My program stores a state in memory, that is recovered from a changelog topic, like in kafka streams. On each processing loop, I am consuming messages, producing messages to an output topics and to

Re: Globalktable usage in a multi microservices application

2021-11-03 Thread Vincent Maurin
Hello The idea behind a GlobalKTable is to materiliaze data (a kafka topic) close from where it is used. Actually, each task/worker will materialize the full GlobalKTable in order to use it. So in your scenario, what should be shared between your services is ideally the Kafka topic used to bu

Re: Customers are getting same emails for roughly 30-40 times

2019-05-24 Thread Vincent Maurin
It also seems you are using "at least one" strategy (maybe with auto-commit, or commiting after sending the email) Maybe a "at most once" could be a valid business strategy here ? - at least once (you will deliver all the emails, but you could deliver duplicates) consumeMessages sendEmails commitO

Re: log is not getting segmented for 1 partition only

2019-05-21 Thread Vincent Maurin
Hi Rajat I am not 100% sure, but I think the roll logic is based on incoming messages. When a message is received, it will compare the timestamp of the first message in the log vs the timestamp of the incoming message. If it is greater than log.roll.ms, it will roll the segment In your scenario,

Re: Kafka broker impact with large number of consumer groups in parallel

2019-04-23 Thread Vincent Maurin
Hi Thangaram Offsets are stored in __consumer_offsets topic. So having a lot of offset to store (around 100k in your case), will require access and write to this topic. Usually it is totally fine, especially when the commit on consumer side is not too frequent. Otherwise in some situation, it coul

Re: Method to check if the log-cleaner of a Kafka broker is running or not

2019-04-15 Thread Vincent Maurin
Hi There is a couple of metrics and log produced by the log cleaner https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/LogCleaner.scala You can try to monitor that (metrics could be fetched with JMX and for the log you can tweak the log4j properties) Don't forget that the log

Re: __consumer_offsets topic, some partitions with the last offset equals to zero

2019-04-09 Thread Vincent Maurin
Hello Jonathan, In the __consumer_offsets, the messages have a key with (group, topic, partition) and the last consumed offset as value. If you have less than 50 group/topic/partition combination in your cluster, it could make sense to have __consumer_offsets partitions without any message Best r

Re: Something like a unique key to prevent same record from being inserted twice?

2019-04-03 Thread Vincent Maurin
Hi, Idempotence flag will guarantee that the message is produce exactly one time on the topic i.e that running your command a single time will produce a single message. It is not a unique enforcement on the message key, there is no such thing in Kafka. In Kafka, a topic containing the "history" o

Re: leader none, with only one replicat end no ISR

2019-04-02 Thread Vincent Maurin
Hello Adrien Could you give more details about your topic configurations (how many partitions, what is the replication factor ?) Usually, partition operations are performed by a broker that has been assigned as the "controller". You should be able to check the controller broker id with this comma

Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Vincent Maurin
Maybe you can find some litterature about messaging patterns ? Usually, a single kafka topic is used to do PubSub pattern, i.e decoupling producers and consumers. In your case, it seems that the situation is quite coupled, i.e you need to generate and send 20k price lists to 20k specific consumers

Re: Offsets of deleted consumer groups do not get deleted correctly

2019-03-29 Thread Vincent Maurin
Hi, You should keep the policy compact for the topic __consumer_offsets This topic stores for each group/topic/partition the offset consumed. As only the latest message for a group/topic/partition is relevant, the policy compact will keep only this message. When you delete a group, actually it wil

Re: Partition Count Dilemma

2019-03-21 Thread Vincent Maurin
Hi 100 partitions is not a high number for this cluster. The downsides of having more partitions are : - having more file descriptors open, check that the limit for the user running kafka are high enough - more work to perform for the brokers and more memory used for keeping the metadata about the

Re: how to aggregate sessions

2018-07-17 Thread Vincent Maurin
(OUTPUT) > > Am I on the right path here? Do I need to detect elapsed session > windows manually somehow? > > On 2018. 07. 17. at 10:31 AM, "Vincent Maurin" wrote:Hi > > Kafka streams sounds like a good solution there. > The first step is to properly partition your

Re: how to aggregate sessions

2018-07-17 Thread Vincent Maurin
Hi Kafka streams sounds like a good solution there. The first step is to properly partition your event topics, based on the session key so all events for the same session will goes to the same partition. Then you could build your kafka streams application, that will maintains a state (manually man

Re: Reliable way to purge data from Kafka topics

2018-05-25 Thread Vincent Maurin
primary DC. That's when we start reverse replication. And to clean up data > replicated from primary to backup (before switch happened), we have to > purge topics on backup Kafka cluster. And that is the challenge. > > On Fri, May 25, 2018 at 12:40 PM Vincent Maurin > > wrote: >

Re: Reliable way to purge data from Kafka topics

2018-05-25 Thread Vincent Maurin
Hi Shantanu I am not sure the scenario you are describing is the best case. I would more consider the problem in term of producers and consumers of the data. Usually is a good practice to put your producer local to your kafka cluster, so in your case, I would suggest you have producers in the main

Re: Frequent consumer rebalance, auto commit failures

2018-05-24 Thread Vincent Maurin
tes, max.poll.interval.ms to 10 minutes. But > > it is not helping still. > > > > On Thu, May 24, 2018 at 6:15 PM Vincent Maurin < > vincent.mau...@glispa.com> > > wrote: > > > >> Hello Shantanu, > >> > >> It is also important to conside

Re: Frequent consumer rebalance, auto commit failures

2018-05-24 Thread Vincent Maurin
Hello Shantanu, It is also important to consider your consumer code. You should not spend to much time in between two calls to "poll" method. Otherwise, the consumer not calling poll will be considered dead by the group, triggering a rebalancing. Best On Thu, May 24, 2018 at 1:45 PM M. Manna wr

Is the key for a state need to be the key of incoming message

2018-01-24 Thread Vincent Maurin
Hello, I am building a kafka stream application consuming a log compacted topic of 12 partitions. Each message has a String key and a json body, the json body contains a date field. I have made a custom Transformer in my topology that consumes this stream, immediately forward the document where th