Re: File channel performance on a single disk is poor

Juhani Connolly Tue, 10 Jul 2012 19:02:17 -0700

Hi, thanks for clarifying.

On 07/10/2012 06:36 PM, Arvind Prabhakar wrote:

Hi,
On Sun, Jul 8, 2012 at 11:14 PM, Juhani Connolly<juhani_conno...@cyberagent.co.jp<mailto:juhani_conno...@cyberagent.co.jp>> wrote:
    Another matter that I'm curious of is whether or not we actually
    need separate files for the data and checkpoints...
The data file and checkpoint files serve different purpose. Checkpointresides in memory and simulates the channel. The only difference isthat it does not store the data in the queue itself, but pointers todata that resides in the log files. As a result the memory footprintof the checkpoint is very small regardless of how big each eventpayload is. This size only depends upon the capacity of the channeland nothing else.

This is more or less what I expected. Am I correct in believing thateach commit has to has to seek back and forth to two different files?This would make all access on a single disk non-sequential.

    Can we not add a magic header before each type of entry to
    differentiate, and thus guarantee significantly more sequential
    access?
In the general case access will be sequential. In the best case, thechannel will have moved the writes to new log files and continue to doreads from old (rolled) files which reduce seek contention. From whatI know, I don't think it will be trivial to affect your suggestedchange without significantly impacting the entire logic of the channel.

I'm not understanding how it reduces the seek contention if the filesare all on the same disk? I don't think the reads are that painful,a lotof it is hopefully taken care of by the os cache...

Implementation would likely be difficult, yes. I've only had an overviewlook at the code, but haven't tried to do it because of this. As yousuggest it might be better to have a separate implementation.

    What is killing performance on a single disk right now is the
    constant seeks. The problem with this though would be putting
    together a file format that allows quick seeking through to the
    correct position, and rolling would be a lot harder. I think this
    is a lot more difficult and might be more of a long term target.
Perhaps what you are describing is a different type of persistentchannel that is optimized for high latency IO systems. I wouldencourage you to take your idea one step further and see if that canbe drafted as yet another channel that serves this particular use-case.

I'd like to do this, though it seems quite involved. Hopefully I can getsome time to figure it out later along the road. Jarcecs spillablechannel should also help on this front.

For the time being, I've resolved the issue for us with a workaround bylimiting the number of commits(by making ExecSource commit multipleentries at a time).

My concern is that FileChannel is represented by a number of people ashaving good performance, when at current time it depends on one of twothings being the case for that: multiple disks, or batched transactions.


Thanks,
 Juhani Connolly

Regards,
Arvind Prabhakar



    Juhani

    Regards,
    Arvind Prabhakar


    On Wed, Jul 4, 2012 at 3:33 AM, Juhani Connolly
    <juhani_conno...@cyberagent.co.jp
    <mailto:juhani_conno...@cyberagent.co.jp>> wrote:

        It looks good to me as it provides a nice balance between
        reliability and throughput.

        It's certainly one possible solution to the issue, though I
        do believe that the current one could be made more friendly
        towards single disk access(e.g. batching writes to the disk
        may well be doable and would be curious what someone with
        more familiarity with the implementation thinks).


        On 07/04/2012 06:36 PM, Jarek Jarcec Cecho wrote:

            We had connected discussion about this "SpillableChannel"
            (working name) on FLUME-1045 and I believe that consensus
            is that we will create something like that. In fact, I'm
            planning to do it myself in near future - I just need to
            prioritize my todo list first.

            Jarcec

            On Wed, Jul 04, 2012 at 06:13:43PM +0900, Juhani Connolly
            wrote:

                Yes... I was actually poking around for that issue as
                I remembered
                seeing it before.  I had before also suggested a
                compound channel
                that would have worked like the buffer store in
                scribe, but general
                opinion was that it provided too many mixed
                configurations that
                could make testings and verifying correctness difficult.

                On 07/04/2012 04:33 PM, Jarek Jarcec Cecho wrote:

                    Hi Juhally,
                    while ago I've filled jira FLUME-1227 where I've
                    suggested creating some sort of SpillableChannel
                    that would behave similarly as scribe. It would
                    be normally acting as memory channel and it would
                    start spilling data to disk in case that it would
                    get full (my primary goal here was to solve issue
                    when remote goes down, for example in case of
                    HDFS maintenance). Would it be helpful for your case?

                    Jarcec

                    On Wed, Jul 04, 2012 at 04:07:48PM +0900, Juhani
                    Connolly wrote:

                        Evaluating flume on some of our servers, the
                        file channel seems very
                        slow, likely because like most typical web
                        servers ours have a
                        single raided disk available for writing to.

                        Quoted below is a suggestion from a  previous
                        issue where our poor
                        throughput came up, where it turns out that
                        on multiple disks, file
                        channel performance is great.

                        On 06/27/2012 11:01 AM, Mike Percy wrote:

                            We are able to push > 8000 events/sec
                            (2KB per event) through a single file
                            channel if you put checkpoint on one disk
                            and use 2 other disks for data dirs. Not
                            sure what the limit is. This is using the
                            latest trunk code. Other limitations may
                            be you need to add additional sinks to
                            your channel to drain it faster. This is
                            because sinks are single threaded and
                            sources are multithreaded.

                            Mike

                        For the case where the disks happen to be
                        available on the server,
                        that's fantastic, but I suspect that most use
                        cases are going to be
                        similar to ours, where multiple disks are not
                        available. Our use
                        case isn't unusual as it's primarily
                        aggregating logs from various
                        services.

                        We originally ran our log servers with a
                        exec(tail)->file->avro
                        setup where throughput was very bad(80mb in
                        an hour). We then
                        switched this to a memory channel which was
                        fine(the peak time 500mb
                        worth of hourly logs went through).
                        Afterwards we switched back to
                        the file channel, but with 5 identical avro
                        sinks. This did not
                        improve throughput(still 80mb).
                        RecoverableMemoryChannel showed very
                        similar characteristics.

                        I presume this is due to the writes going to
                        two separate places,
                        and being further compounded by also writing
                        out and tailing the
                        normal web logs: checking top and iostat, we
                        could confirm we have
                        significant iowait time, far more than we
                        have during typical
                        operation.

                        As it is, we seem to be more or less
                        guaranteeing no loss of logs
                        with the file channel. Perhaps we could look
                        into batching
                        puts/takes for those that do not need 100%
                        data retention but want
                        more reliability than with the MemoryChannel
                        which can potentially
                        lose the entire capacity on a restart?
                        Another possibility is
                        writing an implementation that writes
                        primarily sequentially. I've
                        been meaning to get a deeper look at the
                        implementation itself to
                        give a more informed commentary on the
                        contents but unfortunately
                        don't have the cycles right now, hopefully
                        someone with a better
                        understanding of the current
                        implementation(along with its
                        interaction with the OS file cache) can
                        comment on this.

Re: File channel performance on a single disk is poor

Reply via email to