Hi Aaron/Erik, We need this approach as well. Can you please include the implementation or design of it.
On Mon, Mar 14, 2016 at 11:55 PM, 马哲超 <mazhechaomaill...@gmail.com> wrote: > I'm also looking forward for this partitioning function. The issue title > has been changed to STORM-1464. > > 2016-01-26 1:38 GMT+08:00 Aaron.Dossett <aaron.doss...@target.com>: > >> Erik — It turned that we did need this in production after all. I >> updated STORM-1494 to include partitioning and I will have an initial PR >> soon for review. >> >> From: Erik Weathers <eweath...@groupon.com> >> Reply-To: "user@storm.apache.org" <user@storm.apache.org> >> Date: Monday, January 11, 2016 at 6:00 PM >> To: "user@storm.apache.org" <user@storm.apache.org> >> Cc: "d...@storm.apache.org" <d...@storm.apache.org> >> Subject: Re: HDFS Bolts -- partitioning output >> >> Awesome Aaron, I can send you what we have done offline! >> >> - Erik >> >> On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <aaron.doss...@target.com> >> wrote: >> >>> Thanks, Erik. Your “Partitioner” is exactly what I had in mind and even >>> what I named my stubbed out interface :-) Since Target has decided against >>> this approach for other reasons, it will have to be a side project for me >>> for now. >>> >>> Best, Aaron >>> >>> From: Erik Weathers <eweath...@groupon.com> >>> Reply-To: "user@storm.apache.org" <user@storm.apache.org> >>> Date: Wednesday, January 6, 2016 at 5:48 PM >>> To: "user@storm.apache.org" <user@storm.apache.org> >>> Cc: "d...@storm.apache.org" <d...@storm.apache.org> >>> Subject: Re: HDFS Bolts -- partitioning output >>> >>> hey Aaron, >>> >>> We've also written a similar bolt at Groupon, we aren't super satisfied >>> with the implementation though. :) We are begrudgingly using it because >>> there is no partitioning support in the OSS storm-hdfs bolt. >>> >>> Though one thing I do like about our implementation is having the >>> ability to define your own "Partitioner" in each topology to do various >>> types of partitioning (date-based, message ID-based, topic-based, >>> whatever). It would be great if your implementation had such logic too. >>> e.g., when deciding the HDFS path for a tuple's data, the Partitioner is >>> called to determine the HDFS path. For example, it can take the Tuple >>> object and an opaque key/value Configuration hash that can pass items like >>> a kafka topic name to be included into the HDFS path. >>> >>> - Erik >>> >>> On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <aaron.doss...@target.com >>> > wrote: >>> >>>> Hi, >>>> >>>> My team was exploring changes to the HDFS bolts that would allow for >>>> partitioning the output, for example into directories corresponding to >>>> day. This is different that the existing functionality to rotate files >>>> based on a set length of time. For unrelated reasons, we are probably not >>>> going to pursue this further. However, I have some code changes that >>>> implement most of this functionality for at least some partitioning use >>>> cases. If there is interest from the user or developer community for this >>>> feature, I could get in shape for a PR to get feedback about our >>>> implementation approach. >>>> >>>> Any feedback on this idea is welcome. Thanks! -Aaron >>>> >>> >>> >> > -- Thanks & Regards Rajasekhar