hey Aaron, We've also written a similar bolt at Groupon, we aren't super satisfied with the implementation though. :) We are begrudgingly using it because there is no partitioning support in the OSS storm-hdfs bolt.
Though one thing I do like about our implementation is having the ability to define your own "Partitioner" in each topology to do various types of partitioning (date-based, message ID-based, topic-based, whatever). It would be great if your implementation had such logic too. e.g., when deciding the HDFS path for a tuple's data, the Partitioner is called to determine the HDFS path. For example, it can take the Tuple object and an opaque key/value Configuration hash that can pass items like a kafka topic name to be included into the HDFS path. - Erik On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <aaron.doss...@target.com> wrote: > Hi, > > My team was exploring changes to the HDFS bolts that would allow for > partitioning the output, for example into directories corresponding to > day. This is different that the existing functionality to rotate files > based on a set length of time. For unrelated reasons, we are probably not > going to pursue this further. However, I have some code changes that > implement most of this functionality for at least some partitioning use > cases. If there is interest from the user or developer community for this > feature, I could get in shape for a PR to get feedback about our > implementation approach. > > Any feedback on this idea is welcome. Thanks! -Aaron >