Mohit, No problem, but Juhani did all the work. :) The behavior is that you can configure an HDFS sink to close a file if it hasn't gotten any writes in some time. After it's been idle for 5 minutes or something, it gets closed. If you get a "late" event that goes to the same path after the file is closed, it will just create a new file in the same path as usual.
Regards, Mike On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <br...@cloudera.com> wrote: > We are currently voting on a 1.3.0 RC on the dev@ list: > > http://s.apache.org/OQ0W > > You don't have to be a committer to vote! :) > > Brock > > On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: > > Thanks a lot!! Now with this what should be the expected behaviour? After > > file is closed a new file is created for writes that come after closing > the > > file? > > > > Thanks again for committing this change. Do you know when 1.3.0 is out? > I am > > currently using the snapshot version of 1.3.0 > > > > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <mpe...@apache.org> wrote: > >> > >> Mohit, > >> FLUME-1660 is now committed and it will be in 1.3.0. In the case where > you > >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the > files > >> will roll normally. > >> > >> Regards, > >> Mike > >> > >> > >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly > >> <juhani_conno...@cyberagent.co.jp> wrote: > >>> > >>> I am actually working on a patch for exactly this, refer to FLUME-1660 > >>> > >>> The patch is on review board right now, I fixed a corner case issue > that > >>> came up with unit testing, but the implementation is not really to my > >>> satisfaction. If you are interested please have a look and add your > opinion. > >>> > >>> https://issues.apache.org/jira/browse/FLUME-1660 > >>> https://reviews.apache.org/r/7659/ > >>> > >>> > >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote: > >>> > >>> Another question I had was about rollover. What's the best way to > >>> rollover files in reasonable timeframe? For instance our path is > YY/MM/DD/HH > >>> so every hour there is new file and the -1 hr is just sitting with > .tmp and > >>> it takes sometimes even hour before .tmp is closed and renamed to > .snappy. > >>> In this situation is there a way to tell flume to rollover files sooner > >>> based on some idle time limit? > >>> > >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <mohitanch...@gmail.com > > > >>> wrote: > >>>> > >>>> Thanks Mike it makes sense. Anyway I can help? > >>>> > >>>> > >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <mpe...@apache.org> > wrote: > >>>>> > >>>>> Hi Mohit, this is a complicated issue. I've filed > >>>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it. > >>>>> > >>>>> In short, it would require a non-trivial amount of work to implement > >>>>> this, and it would need to be done carefully. I agree that it would > be > >>>>> better if Flume handled this case more gracefully than it does > today. Today, > >>>>> Flume assumes that you have some job that would go and clean up the > .tmp > >>>>> files as needed, and that you understand that they could be partially > >>>>> written if a crash occurred. > >>>>> > >>>>> Regards, > >>>>> Mike > >>>>> > >>>>> > >>>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia < > mohitanch...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> What we are seeing is that if flume gets killed either because of > >>>>>> server failure or other reasons, it keeps around the .tmp file. > Sometimes > >>>>>> for whatever reasons .tmp file is not readable. Is there a way to > rollover > >>>>>> .tmp file more gracefully? > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ >