Thanks for reporting your findings back. I don't know many details of your app, but I strongly suggest you do as many aggregations as possible in Dataflow itself rather than using an external storage. This will be lot more scalable and more importantly much simpler to manage and make changes in future.
Raghu. On Wed, Jul 27, 2016 at 10:06 AM, amir bahmanyari <[email protected]> wrote: > Hi Raghu, > Kafka is ok sending one record at a time. I proved it. > The issue is maintaining concurrency in Redis maps thats being > accessed+modified by parallel Pipelines at runtime. > And the variability of the apparent duplication is due to unpredictable > number of the parallel pipelines accessing a shared Redis map during each > separate run. > Thats my challenge at the moment. > Thanks for your help. > Amir > > > ------------------------------ > *From:* Raghu Angadi <[email protected]> > *To:* [email protected]; amir bahmanyari <[email protected]> > > *Sent:* Tuesday, July 26, 2016 2:34 PM > *Subject:* Re: Avoid reading duplicates from KafkaIO!!! > > > On Tue, Jul 26, 2016 at 2:29 PM, amir bahmanyari <[email protected]> > wrote: > > I know this is not Kafka forum ,but could Kafka be sending redundant > records? > > > no, equally unlikely. You could verify if you pushed duplicates to Kafka > by reading directly from kafka console consumer. > > e.g. : bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic > my_topi --from-beginning | grep .... > > >
