Thanks, Erik. Your "Partitioner" is exactly what I had in mind and even what I named my stubbed out interface :-) Since Target has decided against this approach for other reasons, it will have to be a side project for me for now.
Best, Aaron From: Erik Weathers <eweath...@groupon.com<mailto:eweath...@groupon.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Wednesday, January 6, 2016 at 5:48 PM To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Cc: "d...@storm.apache.org<mailto:d...@storm.apache.org>" <d...@storm.apache.org<mailto:d...@storm.apache.org>> Subject: Re: HDFS Bolts -- partitioning output hey Aaron, We've also written a similar bolt at Groupon, we aren't super satisfied with the implementation though. :) We are begrudgingly using it because there is no partitioning support in the OSS storm-hdfs bolt. Though one thing I do like about our implementation is having the ability to define your own "Partitioner" in each topology to do various types of partitioning (date-based, message ID-based, topic-based, whatever). It would be great if your implementation had such logic too. e.g., when deciding the HDFS path for a tuple's data, the Partitioner is called to determine the HDFS path. For example, it can take the Tuple object and an opaque key/value Configuration hash that can pass items like a kafka topic name to be included into the HDFS path. - Erik On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <aaron.doss...@target.com<mailto:aaron.doss...@target.com>> wrote: Hi, My team was exploring changes to the HDFS bolts that would allow for partitioning the output, for example into directories corresponding to day. This is different that the existing functionality to rotate files based on a set length of time. For unrelated reasons, we are probably not going to pursue this further. However, I have some code changes that implement most of this functionality for at least some partitioning use cases. If there is interest from the user or developer community for this feature, I could get in shape for a PR to get feedback about our implementation approach. Any feedback on this idea is welcome. Thanks! -Aaron