Hi Raghu,I monitored what gets sent from Kafka to a Kafka Consumer instance. 
The data is out of sequence comparing to whats in the source file I read from 
line by line and send it to Kafka.Here is how I created the topic: one 
partition only so EVERYTHING goes in that 1 partition & according to Kafka 
docs, it guaranteed ordered as it was sent.
./kafka-topics.sh --zookeeper kafhost:2181 --create --topic lrroad --partitions 
1 --replication-factor 1

This is a good link:https://kafka.apache.org/08/introduction.html
Pls see the following statement. It translates to me to create a topic with 
only 1 partition like you see the command line above.Is there anything else I 
should do besides creating the topic with 1 partition to get total order in 
that 1 partition. At the moment, I dont care about parallelism.
"Kafka only provides a total order over messages within a partition. This 
combined with the ability to partition data by key is sufficient for the vast 
majority of applications. However, if you require a total order over messages 
this can be achieved with a topic that has only one partition, though this will 
mean only one consumer process."
And I have only one consumer: my FlinkRunner app running in a 2-nodes Flink 
Cluster.Thanks for your response Raghu...Cheers
      From: Raghu Angadi <[email protected]>
 To: [email protected]; amir bahmanyari <[email protected]> 
Cc: Thomas Groh <[email protected]>
 Sent: Wednesday, August 17, 2016 6:23 PM
 Subject: Re: Is Beam pipeline runtime behavior inconsistent?
   
Thanks Amir for digging deeper into this.
On Wed, Aug 17, 2016 at 5:21 PM, amir bahmanyari <[email protected]> wrote:

Ok. I see its Kafka that doesn't send records to KafkaIO() in the same exact 
order as its being sent to it.I proved it with a stand alone consumer several 
times and it shows.As per Kafka docs suggestions, I recreated a new topic with 
number of partitions=1 which Kafka docs say that guarantees exact order in a 
single partition.It still doesn't send them in the right order even with the 
number of replications being just 1 i.e. no parallelism at all.

I would be very surprised if this is the case with Kafka. Are you publishing in 
single thread? 



   

Reply via email to