Re: .tmp in hdfs sink

Juhani Connolly Wed, 28 Nov 2012 21:20:43 -0800

The changes are in both the 1.3 RC5 and in the 1.4 trunk


On 11/29/2012 01:26 PM, Mohit Anchlia wrote:

If I grab the last snapshot would I get these changes?

On Tue, Nov 20, 2012 at 3:24 PM, Mohit Anchlia <mohitanch...@gmail.com<mailto:mohitanch...@gmail.com>> wrote:


    that's awesome!


    On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <mpe...@apache.org
    <mailto:mpe...@apache.org>> wrote:

        Mohit,
        No problem, but Juhani did all the work. :)

        The behavior is that you can configure an HDFS sink to close a
        file if it hasn't gotten any writes in some time. After it's
        been idle for 5 minutes or something, it gets closed. If you
        get a "late" event that goes to the same path after the file
        is closed, it will just create a new file in the same path as
        usual.

        Regards,
        Mike


        On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland
        <br...@cloudera.com <mailto:br...@cloudera.com>> wrote:

            We are currently voting on a 1.3.0 RC on the dev@ list:

            http://s.apache.org/OQ0W

            You don't have to be a committer to vote! :)

            Brock

            On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia
            <mohitanch...@gmail.com <mailto:mohitanch...@gmail.com>>
            wrote:
            > Thanks a lot!! Now with this what should be the expected
            behaviour? After
            > file is closed a new file is created for writes that
            come after closing the
            > file?
            >
            > Thanks again for committing this change. Do you know
            when 1.3.0 is out? I am
            > currently using the snapshot version of 1.3.0
            >
            > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy
            <mpe...@apache.org <mailto:mpe...@apache.org>> wrote:
            >>
            >> Mohit,
            >> FLUME-1660 is now committed and it will be in 1.3.0. In
            the case where you
            >> are using 1.2.0, I suggest running with
            hdfs.rollInterval set so the files
            >> will roll normally.
            >>
            >> Regards,
            >> Mike
            >>
            >>
            >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly
            >> <juhani_conno...@cyberagent.co.jp
            <mailto:juhani_conno...@cyberagent.co.jp>> wrote:
            >>>
            >>> I am actually working on a patch for exactly this,
            refer to FLUME-1660
            >>>
            >>> The patch is on review board right now, I fixed a
            corner case issue that
            >>> came up with unit testing, but the implementation is
            not really to my
            >>> satisfaction. If you are interested please have a look
            and add your opinion.
            >>>
            >>> https://issues.apache.org/jira/browse/FLUME-1660
            >>> https://reviews.apache.org/r/7659/
            >>>
            >>>
            >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote:
            >>>
            >>> Another question I had was about rollover. What's the
            best way to
            >>> rollover files in reasonable timeframe? For instance
            our path is YY/MM/DD/HH
            >>> so every hour there is new file and the -1 hr is just
            sitting with .tmp and
            >>> it takes sometimes even hour before .tmp is closed and
            renamed to .snappy.
            >>> In this situation is there a way to tell flume to
            rollover files sooner
            >>> based on some idle time limit?
            >>>
            >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia
            <mohitanch...@gmail.com <mailto:mohitanch...@gmail.com>>
            >>> wrote:
            >>>>
            >>>> Thanks Mike it makes sense. Anyway I can help?
            >>>>
            >>>>
            >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy
            <mpe...@apache.org <mailto:mpe...@apache.org>> wrote:
            >>>>>
            >>>>> Hi Mohit, this is a complicated issue. I've filed
            >>>>> https://issues.apache.org/jira/browse/FLUME-1714 to
            track it.
            >>>>>
            >>>>> In short, it would require a non-trivial amount of
            work to implement
            >>>>> this, and it would need to be done carefully. I
            agree that it would be
            >>>>> better if Flume handled this case more gracefully
            than it does today. Today,
            >>>>> Flume assumes that you have some job that would go
            and clean up the .tmp
            >>>>> files as needed, and that you understand that they
            could be partially
            >>>>> written if a crash occurred.
            >>>>>
            >>>>> Regards,
            >>>>> Mike
            >>>>>
            >>>>>
            >>>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia
            <mohitanch...@gmail.com <mailto:mohitanch...@gmail.com>>
            >>>>> wrote:
            >>>>>>
            >>>>>> What we are seeing is that if flume gets killed
            either because of
            >>>>>> server failure or other reasons, it keeps around
            the .tmp file. Sometimes
            >>>>>> for whatever reasons .tmp file is not readable. Is
            there a way to rollover
            >>>>>> .tmp file more gracefully?
            >>>>>
            >>>>>
            >>>>
            >>>
            >>>
            >>
            >



            --
            Apache MRUnit - Unit testing MapReduce -
            http://incubator.apache.org/mrunit/

Re: .tmp in hdfs sink

Reply via email to