Flume is not suited for file transfers as such. With that, please see my comments below:
- support for variable transaction size that could be set by the source or > interceptor > The transactions are already variable sized. The only configuration that applies on top is the maximum size of a transaction. How is this different from what you are proposing? > - SpoolDir to support creation of one transaction per file > If the file is large, you would run out of heap space quickly. Also, how do you recover from intermittent failures? > - File and Memory channels to support spawning a process on transaction > successful commit. Such process can be a bash script, but that would be > implemented in plug-able class You may be better off using something like an Oozie action to trigger a job when the dataset is complete. Regards, Arvind On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <[email protected]> wrote: > Hi group, > > Manohar's requirements sound valid. Guess there are other cases such > "completion notification" could come in handy. > > Thus, I would propose these distinct features that would make this > possible via configuration: > - support for variable transaction size that could be set by the source > or interceptor > - SpoolDir to support creation of one transaction per file > - File and Memory channels to support spawning a process on transaction > successful commit. Such process can be a bash script, but that would be > implemented in plug-able class > > The one thing I'm not sure about until I look at the code, if HDFSSink > will write flush cache to the HDFS once it encounters no more events in a > transaction. > > What do you guys think ? > > > On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <[email protected]> > wrote: > >> Thanks Hari for your response. >> >> >> My requirement goes like this - >> >> >> 1) There are bunch of files coming in at regular intervals (hourly or >> daily) in my spoolDir >> >> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like >> /target/%Y-%M%D so each day file gets into different destination HDFS >> >> 3) Now once this flume completes copying files , I want to kick off my MR >> job. >> >> >> Thanks, >> >> Manohar >> ------------------------------ >> *From:* Hari Shreedharan <[email protected]> >> *Sent:* Saturday, December 6, 2014 7:16 AM >> *To:* [email protected] >> *Cc:* [email protected] >> *Subject:* Re: Notification support from flume? >> >> Looking at .COMPLETED is not an indication that the data has been >> written out to HDFS. As of now, unfortunately there is no way to tag an >> event as coming from a specific file. I can’t think of a way to do this in >> a fool-proof way off the top of my mind. What is your use-case, there might >> be another way to do the same thing? >> >> Thanks, >> Hari >> >> >> On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <[email protected]> >> wrote: >> >>> Hi All, >>> >>> >>> >>> I wanted to know if there is a way of notification mechanism or some way >>> of finding out if flume has finished transfer of certain file from spoolDir >>> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can >>> assume its completed but wanted to know if there is more reliable way of >>> call back mechanism ? >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Manohar. >>> >>> >>> >>> >>> Please consider the environment before printing this e-mail >>> >>> >>> Disclaimer: This communication is for the exclusive use of the intended >>> recipient(s) and shall not attach any liability on the originator or ITC >>> Infotech India Ltd./its Holding company/ its Subsidiaries/ its Group >>> Companies. If you are the addressee, the contents of this e-mail are >>> intended for your use only and it shall not be forwarded to any third >>> party, without first obtaining written authorization from the originator or >>> ITC Infotech India Ltd./ its Holding company/its Subsidiaries/ its Group >>> Companies. It may contain information which is confidential and legally >>> privileged and the same shall not be used or dealt with by any third >>> party in any manner whatsoever without the specific consent of ITC >>> Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group >>> Companies. >> >> >> >> >> >> Please consider the environment before printing this e-mail >> >> >> Disclaimer: This communication is for the exclusive use of the intended >> recipient(s) and shall not attach any liability on the originator or ITC >> Infotech India Ltd./its Holding company/ its Subsidiaries/ its Group >> Companies. If you are the addressee, the contents of this e-mail are >> intended for your use only and it shall not be forwarded to any third >> party, without first obtaining written authorization from the originator or >> ITC Infotech India Ltd./ its Holding company/its Subsidiaries/ its Group >> Companies. It may contain information which is confidential and legally >> privileged and the same shall not be used or dealt with by any third >> party in any manner whatsoever without the specific consent of ITC >> Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group >> Companies. >> > > > > -- > > Best regards, > Ahmed Vila | Senior software developer > DevLogic | Sarajevo | Bosnia and Herzegovina > > Office : +387 33 942 123 > Mobile: +387 62 139 348 > > Website: www.devlogic.eu > E-mail : [email protected] > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. >
