Re: HDFS Bolts -- partitioning output

马哲超 Mon, 14 Mar 2016 23:55:42 -0700

I'm also looking forward for this partitioning function. The issue title
has been changed to STORM-1464.


2016-01-26 1:38 GMT+08:00 Aaron.Dossett <aaron.doss...@target.com>:

> Erik — It turned that we did need this in production after all.  I updated
> STORM-1494 to include partitioning and I will have an initial PR soon for
> review.
>
> From: Erik Weathers <eweath...@groupon.com>
> Reply-To: "user@storm.apache.org" <user@storm.apache.org>
> Date: Monday, January 11, 2016 at 6:00 PM
> To: "user@storm.apache.org" <user@storm.apache.org>
> Cc: "d...@storm.apache.org" <d...@storm.apache.org>
> Subject: Re: HDFS Bolts -- partitioning output
>
> Awesome Aaron, I can send you what we have done offline!
>
> - Erik
>
> On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <aaron.doss...@target.com>
> wrote:
>
>> Thanks, Erik.  Your “Partitioner” is exactly what I had in mind and even
>> what I named my stubbed out interface :-)  Since Target has decided against
>> this approach for other reasons, it will have to be a side project for me
>> for now.
>>
>> Best, Aaron
>>
>> From: Erik Weathers <eweath...@groupon.com>
>> Reply-To: "user@storm.apache.org" <user@storm.apache.org>
>> Date: Wednesday, January 6, 2016 at 5:48 PM
>> To: "user@storm.apache.org" <user@storm.apache.org>
>> Cc: "d...@storm.apache.org" <d...@storm.apache.org>
>> Subject: Re: HDFS Bolts -- partitioning output
>>
>> hey Aaron,
>>
>> We've also written a similar bolt at Groupon, we aren't super satisfied
>> with the implementation though. :)  We are begrudgingly using it because
>> there is no partitioning support in the OSS storm-hdfs bolt.
>>
>> Though one thing I do like about our implementation is having the ability
>> to define your own "Partitioner" in each topology to do various types of
>> partitioning (date-based, message ID-based, topic-based, whatever).  It
>> would be great if your implementation had such logic too.  e.g., when
>> deciding the HDFS path for a tuple's data, the Partitioner is called to
>> determine the HDFS path.  For example, it can take the Tuple object and an
>> opaque key/value Configuration hash that can pass items like a kafka topic
>> name to be included into the HDFS path.
>>
>> - Erik
>>
>> On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <aaron.doss...@target.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My team was exploring changes to the HDFS bolts that would allow for
>>> partitioning the output, for example into directories corresponding to
>>> day.  This is different that the existing functionality to rotate files
>>> based on a set length of time.  For unrelated reasons, we are probably not
>>> going to pursue this further.  However, I have some code changes that
>>> implement most of this functionality for at least some partitioning use
>>> cases.  If there is interest from the user or developer community for this
>>> feature, I could get in shape for a PR to get feedback about our
>>> implementation approach.
>>>
>>> Any feedback on this idea is welcome.  Thanks! -Aaron
>>>
>>
>>
>

Re: HDFS Bolts -- partitioning output

Reply via email to