Hi, thanks for clarifying.
On 07/10/2012 06:36 PM, Arvind Prabhakar wrote:
Hi,
On Sun, Jul 8, 2012 at 11:14 PM, Juhani Connolly
<juhani_conno...@cyberagent.co.jp
<mailto:juhani_conno...@cyberagent.co.jp>> wrote:
Another matter that I'm curious of is whether or not we actually
need separate files for the data and checkpoints...
The data file and checkpoint files serve different purpose. Checkpoint
resides in memory and simulates the channel. The only difference is
that it does not store the data in the queue itself, but pointers to
data that resides in the log files. As a result the memory footprint
of the checkpoint is very small regardless of how big each event
payload is. This size only depends upon the capacity of the channel
and nothing else.
This is more or less what I expected. Am I correct in believing that
each commit has to has to seek back and forth to two different files?
This would make all access on a single disk non-sequential.
Can we not add a magic header before each type of entry to
differentiate, and thus guarantee significantly more sequential
access?
In the general case access will be sequential. In the best case, the
channel will have moved the writes to new log files and continue to do
reads from old (rolled) files which reduce seek contention. From what
I know, I don't think it will be trivial to affect your suggested
change without significantly impacting the entire logic of the channel.
I'm not understanding how it reduces the seek contention if the files
are all on the same disk? I don't think the reads are that painful,a lot
of it is hopefully taken care of by the os cache...
Implementation would likely be difficult, yes. I've only had an overview
look at the code, but haven't tried to do it because of this. As you
suggest it might be better to have a separate implementation.
What is killing performance on a single disk right now is the
constant seeks. The problem with this though would be putting
together a file format that allows quick seeking through to the
correct position, and rolling would be a lot harder. I think this
is a lot more difficult and might be more of a long term target.
Perhaps what you are describing is a different type of persistent
channel that is optimized for high latency IO systems. I would
encourage you to take your idea one step further and see if that can
be drafted as yet another channel that serves this particular use-case.
I'd like to do this, though it seems quite involved. Hopefully I can get
some time to figure it out later along the road. Jarcecs spillable
channel should also help on this front.
For the time being, I've resolved the issue for us with a workaround by
limiting the number of commits(by making ExecSource commit multiple
entries at a time).
My concern is that FileChannel is represented by a number of people as
having good performance, when at current time it depends on one of two
things being the case for that: multiple disks, or batched transactions.
Thanks,
Juhani Connolly
Regards,
Arvind Prabhakar
Juhani
Regards,
Arvind Prabhakar
On Wed, Jul 4, 2012 at 3:33 AM, Juhani Connolly
<juhani_conno...@cyberagent.co.jp
<mailto:juhani_conno...@cyberagent.co.jp>> wrote:
It looks good to me as it provides a nice balance between
reliability and throughput.
It's certainly one possible solution to the issue, though I
do believe that the current one could be made more friendly
towards single disk access(e.g. batching writes to the disk
may well be doable and would be curious what someone with
more familiarity with the implementation thinks).
On 07/04/2012 06:36 PM, Jarek Jarcec Cecho wrote:
We had connected discussion about this "SpillableChannel"
(working name) on FLUME-1045 and I believe that consensus
is that we will create something like that. In fact, I'm
planning to do it myself in near future - I just need to
prioritize my todo list first.
Jarcec
On Wed, Jul 04, 2012 at 06:13:43PM +0900, Juhani Connolly
wrote:
Yes... I was actually poking around for that issue as
I remembered
seeing it before. I had before also suggested a
compound channel
that would have worked like the buffer store in
scribe, but general
opinion was that it provided too many mixed
configurations that
could make testings and verifying correctness difficult.
On 07/04/2012 04:33 PM, Jarek Jarcec Cecho wrote:
Hi Juhally,
while ago I've filled jira FLUME-1227 where I've
suggested creating some sort of SpillableChannel
that would behave similarly as scribe. It would
be normally acting as memory channel and it would
start spilling data to disk in case that it would
get full (my primary goal here was to solve issue
when remote goes down, for example in case of
HDFS maintenance). Would it be helpful for your case?
Jarcec
On Wed, Jul 04, 2012 at 04:07:48PM +0900, Juhani
Connolly wrote:
Evaluating flume on some of our servers, the
file channel seems very
slow, likely because like most typical web
servers ours have a
single raided disk available for writing to.
Quoted below is a suggestion from a previous
issue where our poor
throughput came up, where it turns out that
on multiple disks, file
channel performance is great.
On 06/27/2012 11:01 AM, Mike Percy wrote:
We are able to push > 8000 events/sec
(2KB per event) through a single file
channel if you put checkpoint on one disk
and use 2 other disks for data dirs. Not
sure what the limit is. This is using the
latest trunk code. Other limitations may
be you need to add additional sinks to
your channel to drain it faster. This is
because sinks are single threaded and
sources are multithreaded.
Mike
For the case where the disks happen to be
available on the server,
that's fantastic, but I suspect that most use
cases are going to be
similar to ours, where multiple disks are not
available. Our use
case isn't unusual as it's primarily
aggregating logs from various
services.
We originally ran our log servers with a
exec(tail)->file->avro
setup where throughput was very bad(80mb in
an hour). We then
switched this to a memory channel which was
fine(the peak time 500mb
worth of hourly logs went through).
Afterwards we switched back to
the file channel, but with 5 identical avro
sinks. This did not
improve throughput(still 80mb).
RecoverableMemoryChannel showed very
similar characteristics.
I presume this is due to the writes going to
two separate places,
and being further compounded by also writing
out and tailing the
normal web logs: checking top and iostat, we
could confirm we have
significant iowait time, far more than we
have during typical
operation.
As it is, we seem to be more or less
guaranteeing no loss of logs
with the file channel. Perhaps we could look
into batching
puts/takes for those that do not need 100%
data retention but want
more reliability than with the MemoryChannel
which can potentially
lose the entire capacity on a restart?
Another possibility is
writing an implementation that writes
primarily sequentially. I've
been meaning to get a deeper look at the
implementation itself to
give a more informed commentary on the
contents but unfortunately
don't have the cycles right now, hopefully
someone with a better
understanding of the current
implementation(along with its
interaction with the OS file cache) can
comment on this.