nsumer (ie, the Java one), does the exact same thing.
-Matthias
On 5/30/23 8:41 AM, Vincent Maurin wrote:
Hello !
I am working on an exactly once stream processors in Python, using
aiokafka client library. My program stores a state in memory, that is
recovered from a changelog topic, like in kaf
Hello !
I am working on an exactly once stream processors in Python, using
aiokafka client library. My program stores a state in memory, that is
recovered from a changelog topic, like in kafka streams.
On each processing loop, I am consuming messages, producing messages
to an output topics and to
Hello
The idea behind a GlobalKTable is to materiliaze data (a kafka topic)
close from where it is used. Actually, each task/worker will materialize
the full GlobalKTable in order to use it. So in your scenario, what
should be shared between your services is ideally the Kafka topic used
to bu
It also seems you are using "at least one" strategy (maybe with
auto-commit, or commiting after sending the email)
Maybe a "at most once" could be a valid business strategy here ?
- at least once (you will deliver all the emails, but you could deliver
duplicates)
consumeMessages
sendEmails
commitO
Hi Rajat
I am not 100% sure, but I think the roll logic is based on incoming
messages.
When a message is received, it will compare the timestamp of the first
message in the log vs the timestamp of the incoming message. If it is
greater than log.roll.ms, it will roll the segment
In your scenario,
Hi Thangaram
Offsets are stored in __consumer_offsets topic. So having a lot of offset
to store (around 100k in your case), will require access and write to this
topic. Usually it is totally fine, especially when the commit on consumer
side is not too frequent.
Otherwise in some situation, it coul
Hi
There is a couple of metrics and log produced by the log cleaner
https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/LogCleaner.scala
You can try to monitor that (metrics could be fetched with JMX and for the
log you can tweak the log4j properties)
Don't forget that the log
Hello Jonathan,
In the __consumer_offsets, the messages have a key with (group, topic,
partition) and the last consumed offset as value.
If you have less than 50 group/topic/partition combination in your cluster,
it could make sense to have __consumer_offsets partitions without any
message
Best r
Hi,
Idempotence flag will guarantee that the message is produce exactly one
time on the topic i.e that running your command a single time will produce
a single message.
It is not a unique enforcement on the message key, there is no such thing
in Kafka.
In Kafka, a topic containing the "history" o
Hello Adrien
Could you give more details about your topic configurations (how many
partitions, what is the replication factor ?)
Usually, partition operations are performed by a broker that has been
assigned as the "controller". You should be able to check the controller
broker id with this comma
Maybe you can find some litterature about messaging patterns ?
Usually, a single kafka topic is used to do PubSub pattern, i.e decoupling
producers and consumers.
In your case, it seems that the situation is quite coupled, i.e you need to
generate and send 20k price lists to 20k specific consumers
Hi,
You should keep the policy compact for the topic __consumer_offsets
This topic stores for each group/topic/partition the offset consumed. As
only the latest message for a group/topic/partition is relevant, the policy
compact will keep only this message. When you delete a group, actually it
wil
Hi
100 partitions is not a high number for this cluster.
The downsides of having more partitions are :
- having more file descriptors open, check that the limit for the user
running kafka are high enough
- more work to perform for the brokers and more memory used for keeping the
metadata about the
(OUTPUT)
>
> Am I on the right path here? Do I need to detect elapsed session
> windows manually somehow?
>
> On 2018. 07. 17. at 10:31 AM, "Vincent Maurin" wrote:Hi
>
> Kafka streams sounds like a good solution there.
> The first step is to properly partition your
Hi
Kafka streams sounds like a good solution there.
The first step is to properly partition your event topics, based on the
session key so all events for the same session will goes to the same
partition.
Then you could build your kafka streams application, that will maintains a
state (manually man
primary DC. That's when we start reverse replication. And to clean up data
> replicated from primary to backup (before switch happened), we have to
> purge topics on backup Kafka cluster. And that is the challenge.
>
> On Fri, May 25, 2018 at 12:40 PM Vincent Maurin >
> wrote:
>
Hi Shantanu
I am not sure the scenario you are describing is the best case. I would
more consider the problem in term of producers and consumers of the data.
Usually is a good practice to put your producer local to your kafka
cluster, so in your case, I would suggest you have producers in the main
tes, max.poll.interval.ms to 10 minutes. But
> > it is not helping still.
> >
> > On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <
> vincent.mau...@glispa.com>
> > wrote:
> >
> >> Hello Shantanu,
> >>
> >> It is also important to conside
Hello Shantanu,
It is also important to consider your consumer code. You should not spend
to much time in between two calls to "poll" method. Otherwise, the consumer
not calling poll will be considered dead by the group, triggering a
rebalancing.
Best
On Thu, May 24, 2018 at 1:45 PM M. Manna wr
Hello,
I am building a kafka stream application consuming a log compacted topic of
12 partitions. Each message has a String key and a json body, the json body
contains a date field.
I have made a custom Transformer in my topology that consumes this stream,
immediately forward the document where th
20 matches
Mail list logo