Hi Raghu,I monitored what gets sent from Kafka to a Kafka Consumer instance. The data is out of sequence comparing to whats in the source file I read from line by line and send it to Kafka.Here is how I created the topic: one partition only so EVERYTHING goes in that 1 partition & according to Kafka docs, it guaranteed ordered as it was sent. ./kafka-topics.sh --zookeeper kafhost:2181 --create --topic lrroad --partitions 1 --replication-factor 1
This is a good link:https://kafka.apache.org/08/introduction.html Pls see the following statement. It translates to me to create a topic with only 1 partition like you see the command line above.Is there anything else I should do besides creating the topic with 1 partition to get total order in that 1 partition. At the moment, I dont care about parallelism. "Kafka only provides a total order over messages within a partition. This combined with the ability to partition data by key is sufficient for the vast majority of applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process." And I have only one consumer: my FlinkRunner app running in a 2-nodes Flink Cluster.Thanks for your response Raghu...Cheers From: Raghu Angadi <[email protected]> To: [email protected]; amir bahmanyari <[email protected]> Cc: Thomas Groh <[email protected]> Sent: Wednesday, August 17, 2016 6:23 PM Subject: Re: Is Beam pipeline runtime behavior inconsistent? Thanks Amir for digging deeper into this. On Wed, Aug 17, 2016 at 5:21 PM, amir bahmanyari <[email protected]> wrote: Ok. I see its Kafka that doesn't send records to KafkaIO() in the same exact order as its being sent to it.I proved it with a stand alone consumer several times and it shows.As per Kafka docs suggestions, I recreated a new topic with number of partitions=1 which Kafka docs say that guarantees exact order in a single partition.It still doesn't send them in the right order even with the number of replications being just 1 i.e. no parallelism at all. I would be very surprised if this is the case with Kafka. Are you publishing in single thread?
