If you create a topic with one partition they will be in order. Alternatively if you publish with the same key for every message they will be in the same order even if your topic has more than 1 partition.
Either way above will work for Kafka. -hans > On May 25, 2018, at 8:56 PM, Raymond Xie <xie3208...@gmail.com> wrote: > > Hello, > > I just started learning Kafka and have the environment setup on my > hortonworks sandbox at home vmware. > > test.csv is what I want the producer to send out: > > more test1.csv ./kafka-console-producer.sh --broker-list > sandbox.hortonworks.com:6667 --topic kafka-topic2 > > 1, abc > 2, def > ... > 8, vwx > 9, zzz > > What I received are all the content of test.csv, however, not in their > original order; > > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic > kafka-topic2 > > 2, def > 1, abc > ... > 9, zzz > 8, vwx > > > I read from google that partition could be the feasible solution, however, > my questions are: > > 1. for small files like this one, shall I really do the partitioning? how > small a partition would be acceptable to ensure the sequence? > 2. for big files, each partition could still contain multiple lines, how to > ensure all the lines in each partition won't get messed up on consumer side? > > > I also want to know what is the best practice to process large volume of > data through kafka? There should be better way other than console command. > > Thank you very much. > > > > *------------------------------------------------* > *Sincerely yours,* > > > *Raymond*