Are you using all grouping for the deserialization bolt too? My original point is that you should not be using this grouping anywhere in your topology. It will cause the duplication you are seeing unless your parallelism is set to one for your bolts. See the section on groupings here https://storm.incubator.apache.org/documentation/Concepts.html and this page about storm parallelism http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/. Also maybe it would help if you share how you are building your topology.
-Nathan On Jul 7, 2014 8:06 PM, "Max Evers" <mcev...@gmail.com> wrote: > The AllGrouping was intended to duplicate the messages to HDFS and the > outbound kafka, what were are seeing is the duplicaiton of messages from > the kafkaspout, so the same message is seen multiple times in hdfs AND in > the outbound kafka, as well as the logs of the intermediate bolt handling > the deserialization. Presently there is not much more going on in storm > that the deserialization. I wanted to get the skeleton of the flow working > before I introduced new complications inside of storm. >