I might be missing something, but it sounds like you want to keep ordering of events which belong to the same id, and not really about having hard-coded partition number for writing.
If so, would it be enough to create a keyed PCollection with id as a key + use KafkaIO.writeRecords() ? I am concluding only by quick check of the javadoc, will try this out when I have time. https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html#writeRecords-- https://www.javadoc.io/doc/org.apache.kafka/kafka-clients/2.4.1/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-K-V- Best Regards, Pavel Solomin Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin <https://www.linkedin.com/in/pavelsolomin> On Thu, 13 Apr 2023 at 20:07, John Casey via user <[email protected]> wrote: > Unfortunately, we don't have a feature to publish to a specific partition. > We tried to design with Kafka conventions in mind, and I don't believe we > plan to add this functionality. > > On Thu, Apr 13, 2023 at 3:03 PM Juan Romero <[email protected]> wrote: > >> Hi John. Thanks for your response!. >> >> Point 2 is clear now for me. I was reading lot of documentation about it >> and i only wanted to confirm with you. >> >> Regarding point 1 I know the drawbacks , we have to avoid hot partitions >> (For this purpouse we can define a hash function that evenly distributes >> the message across the three partitions ). However in our specific case we >> are using a tool that ingest the data from transactional databases (This >> tool allow us to define the partitioner function) and we want to persist >> the last state of an specific database row in a target OLAP database. For >> example we could have the following events which indicates that an specific >> row was updated: >> >> ("Id": "123456 ", "name": "Juan", "last_name": "Romero", "timestamp": >> "13-04-2023 13:44:00.100", "Event": "Update") >> ("Id": "123456 ", "name": "Juan", "last_name": "Romero2", "timestamp": >> "13-04-2023 13:44:00.200", "Event": "Update") >> ("Id": "123456 ", "name": "Juan", "last_name": "Romero3", "timestamp": >> "13-04-2023 13:44:00.300", "Event": "Update") >> >> Suppose all this events are being processed by different workers (worker >> is a consumer and producer at the same time in this case, which at the end >> push the data in other target topic). In the source we can guarantee that >> the updatings with the same id go to the same partitions (so far the >> messages would be ordered ), the problem is that we will have many >> consumers and producers in the middle and in this case we can read the >> messages in order from the first topic (because the same key always got to >> the same partition) in the chain but after it we have to process the >> message and push the message in other topic that have 3 partitions (like i >> said before), in this case the producer will push the row updatings between >> the different partitions and we can potencially lose the original messages >> order, because we are pushing the different updates referring an specific >> row across multiple partions. >> >> Let me know If you can understand my case. If you want I can give you a >> clear diagram to explain to you. >> >> The general idea is that we need push the last valievent in the >> transactional database in the target and in the middle we have >> different beam pipelines which act like consumer and producer. >> >> Thnaks!! >> >> >> >> >> >> El jue, 13 abr 2023 a las 13:23, John Casey via user (< >> [email protected]>) escribió: >> >>> Hi Juan, >>> >>> Under normal usage, Kafka will maintain ordering within a partition >>> without any extra work by you. >>> >>> For 2, you can use .commitOffsetsInFinalize to only commit back to the >>> source topic once the pipeline has persisted the message, at which point it >>> may not be fully processed, but it is guaranteed that it will be processed. >>> >>> For 1, I would generally recommend against trying to write to a specific >>> partition. This undermines the load balancing that Kafka enables with >>> partitioning. I'd recommend separate topics instead. We don't support >>> writing to specific partitions in the current IO. >>> >>> John >>> >>> On Thu, Apr 13, 2023 at 2:15 PM Juan Romero <[email protected]> wrote: >>> >>>> Hi Apache Beam team. I have been working on a POC for the company i'm >>>> working for and Im using apache beam kafka connector to read from kafka >>>> topic and write into other kafka topic. The source and target topic have 3 >>>> partitions and is compulsory keep ordering by certain message keys. >>>> Regarding it I have two questions: >>>> >>>> 1. How can I write an specific message to an specific kafka partition. >>>> 2. How can we commit the message to the source topic only and only when >>>> the pipeline had processed the message. >>>> >>>> I looking forward and hope you can help me with these doubts. >>>> >>>
