Thanks, Erik.  Your "Partitioner" is exactly what I had in mind and even what I 
named my stubbed out interface :-)  Since Target has decided against this 
approach for other reasons, it will have to be a side project for me for now.

Best, Aaron

From: Erik Weathers <eweath...@groupon.com<mailto:eweath...@groupon.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Wednesday, January 6, 2016 at 5:48 PM
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Cc: "d...@storm.apache.org<mailto:d...@storm.apache.org>" 
<d...@storm.apache.org<mailto:d...@storm.apache.org>>
Subject: Re: HDFS Bolts -- partitioning output

hey Aaron,

We've also written a similar bolt at Groupon, we aren't super satisfied with 
the implementation though. :)  We are begrudgingly using it because there is no 
partitioning support in the OSS storm-hdfs bolt.

Though one thing I do like about our implementation is having the ability to 
define your own "Partitioner" in each topology to do various types of 
partitioning (date-based, message ID-based, topic-based, whatever).  It would 
be great if your implementation had such logic too.  e.g., when deciding the 
HDFS path for a tuple's data, the Partitioner is called to determine the HDFS 
path.  For example, it can take the Tuple object and an opaque key/value 
Configuration hash that can pass items like a kafka topic name to be included 
into the HDFS path.

- Erik

On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett 
<aaron.doss...@target.com<mailto:aaron.doss...@target.com>> wrote:
Hi,

My team was exploring changes to the HDFS bolts that would allow for 
partitioning the output, for example into directories corresponding to day.  
This is different that the existing functionality to rotate files based on a 
set length of time.  For unrelated reasons, we are probably not going to pursue 
this further.  However, I have some code changes that implement most of this 
functionality for at least some partitioning use cases.  If there is interest 
from the user or developer community for this feature, I could get in shape for 
a PR to get feedback about our implementation approach.

Any feedback on this idea is welcome.  Thanks! -Aaron

Reply via email to