On Thu, Jan 30, 2014 at 9:29 AM, Umesh Telang <[email protected]>wrote:
> Ah, ok. So 32 bytes is required for each pointer to an event. > Yep :) > We'll amend our heap size accordingly. We may also be able to reduce our > FileChannel size. We hadn't understood the implications of the capacity > value of the FileChannel we have been using. > > Regarding the multiple data directories, I hadn't realised that that > implied distinct disks. Just to confirm, you're saying that each data > directory has to be on a distinct disk? > The recommendation is that you have two data directories per distinct disk. > Is it that FileChannel can't utilise an entire disk from an IO > perspective, regardless of how big the disk is? > Right, it has nothing to do with size and everything todo with IO bandwidth. We could optimize this area (and will) but for now specifying two data directories per disk is a good workaround. > Or is this size-dependent? i.e above a certain size, you need a second > data directory? If the latter, could you let me know what that size is? > If it's a general point, then I'll follow the earlier advice of 2 data > dirs per file channel. > Doesn't relate to size. > > Apologies for all the questions! > > We had made an estimation of disk space (avg event size (~250 bytes) * > channel size (150M)) and have provisioned disks that are significantly > larger than the required space. > Perfect, great to hear! > > Thanks, > Umesh > > ------------------------------ > *From:* Brock Noland [[email protected]] > *Sent:* 30 January 2014 14:38 > > *To:* [email protected] > *Subject:* Re: checkpoint lifecycle > > On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang > <[email protected]>wrote: > >> Hi Brock, >> >> Our heap size is 2GB. >> > > That is not enough heap for 150M events. It's 150 million * 32 bytes = > 4.5GB + say 100-500MB for the rest of Flume. > > >> >> Thanks for the advice on data directories. Could you please let me know >> the heuristic for that? (e.g. 1 data directory per N-sized channel where >> N is...) >> > > File channel at present cannot utilize an entire disk from a IO > perspective, that is why I suggest multiple disks. Of course you'll want to > ensure that you have enough disk to support a full channel, but that is a > different discussion (avg event size * channel size). > > >> >> Thanks also for suggesting back up checkpoints - are these something >> that increases the integrity of Flume's execution in an automatic fashion, >> or does it aid in some form of manual recovery? >> > > Automatic. If flume is killed or shutdown during a checkpoint that > checkpoint is invalid and unless a backup checkpoint exists a full replay > will have to take place. Furthermore, without FLUME-2155 full replays are > very time consuming under certain conditions. > > >> >> Re: FLUME-2155, I've scanned through it, and will read it in more >> detail. I'm not sure about the unit of measurement for some of the metrics >> (milliseconds?), but is there any guidance as to at which order of >> magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue >> to become apparent? >> > > It's not purely about channel size. Specifically it's about: > > 1) Large channel size > 2) Having a large number of events in your channel (queue depth) > 3) Having run the channel for some time such that old WAL's were cleaned > up (causing there to be removes for which no event exists) > 4) Performing a full replay in these conditions > > Generally I wouldn't go over a 1M channel size without backup > checkpoint, this change, or both. There are more details here: > > > https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465 > > Brock > > > > ---------------------------- > > > http://www.bbc.co.uk > This e-mail (and any attachments) is confidential and may contain personal > views which are not the views of the BBC unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor act in > reliance on it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. > > --------------------- > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
