Ok. The state persistence concurrency issue Imentioned below in Flink Cluster
is finally resolved.I tried everything with Redis. There is a
concurrency/locking issue which I couldnt figure it out.Replaced Redis with
simple Java 1.7 ConcurrentHashMaps only for key "state" values...magic!! J
And you get a bonus as well: It gets REALLY Fast...Thanks+regardsAmir-
From: Raghu Angadi <[email protected]>
To: amir bahmanyari <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Wednesday, July 27, 2016 10:16 AM
Subject: Re: Avoid reading duplicates from KafkaIO!!!
Thanks for reporting your findings back.
I don't know many details of your app, but I strongly suggest you do as many
aggregations as possible in Dataflow itself rather than using an external
storage. This will be lot more scalable and more importantly much simpler to
manage and make changes in future.
Raghu.
On Wed, Jul 27, 2016 at 10:06 AM, amir bahmanyari <[email protected]> wrote:
Hi Raghu,Kafka is ok sending one record at a time. I proved it.The issue is
maintaining concurrency in Redis maps thats being accessed+modified by parallel
Pipelines at runtime.And the variability of the apparent duplication is due to
unpredictable number of the parallel pipelines accessing a shared Redis map
during each separate run.Thats my challenge at the moment.Thanks for your
help.Amir
From: Raghu Angadi <[email protected]>
To: [email protected]; amir bahmanyari <[email protected]>
Sent: Tuesday, July 26, 2016 2:34 PM
Subject: Re: Avoid reading duplicates from KafkaIO!!!
On Tue, Jul 26, 2016 at 2:29 PM, amir bahmanyari <[email protected]> wrote:
I know this is not Kafka forum ,but could Kafka be sending redundant records?
no, equally unlikely. You could verify if you pushed duplicates to Kafka by
reading directly from kafka console consumer.
e.g. : bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
my_topi --from-beginning | grep ....