Is there a concept of a data file which is 'done'? Does Flume remove data files 
it no longer needs, or will these build up?



Regards,
Guy Needham | Data Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE emails. Learn more at http://vsre.info/



________________________________
From: Hari Shreedharan [mailto:[email protected]]
Sent: 10 November 2014 09:15
To: [email protected]
Cc: [email protected]
Subject: RE: File channels creating many large files

That value is in bytes. At 500k, you will likely end up with too many files. 
You should set it as high as you can.

Thanks, Hari



On Mon, Nov 10, 2014 at 1:05 AM, Needham, Guy 
<[email protected]<mailto:[email protected]>> wrote:

Hari, Jeff,

thanks for your replies. It's Flume 1.5.0, I'll use the maxFileSize parameter 
to fix this. Is there any impact on channel optimisation from setting it to say 
500000?



Regards,
Guy Needham | Data Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE emails. Learn more at http://vsre.info/



________________________________
From: Hari Shreedharan [mailto:[email protected]]
Sent: 07 November 2014 17:59
To: [email protected]
Cc: [email protected]
Subject: Re: File channels creating many large files

Flume will leave at least 2 files per data directory. Once you have enough 
events to cause 2 files to be created, there will be at least 2 per dir. You 
can use maxFileSize parameter to control the size of these files.

Thanks, Hari



On Fri, Nov 7, 2014 at 10:25 AM, Jeff Lord 
<[email protected]<mailto:[email protected]>> wrote:

Guy,

What version of flume is this?

-Jeff

On Fri, Nov 7, 2014 at 1:19 AM, Needham, Guy 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,

I have a configuration with a file channel configured such that:

a1.channels.ch1.type = file
a1.channels.ch1.checkpointDir = /hadoop/user/flume/channels/checkpoint
a1.channels.ch1.dataDirs = /hadoop/user/flume/channels/data
a1.channels.ch1.capacity = 100000
a1.channels.ch1.transactionCapacity = 5000

It's been running since October 28th with no issues, but when I looked today in 
/hadoop/user/flume/channels/data I saw that the file channel was building up 
large files which had been processed and not deleting them:

[rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/
total 1.6G
-rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1
-rw-r----- 1 rdd rdd   47 Oct 28 16:10 log-1.meta
-rw-r----- 1 rdd rdd  72M Oct 31 16:28 log-2
-rw-r----- 1 rdd rdd   47 Oct 31 16:29 log-2.meta
It seems like for each day that data landed (we're still in testing so data not 
landing constantly) a data file has been created but not deleted when reading 
was completed.
Is this expected behaviour? Is there a way to stop large files building up and 
still use the file channel?
Regards,
Guy Needham | Data Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE emails. Learn more at http://vsre.info/



--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail?

Visit www.virginmedia.com<http://www.virginmedia.com> for more information, and 
more fun.

This email and any attachments are or may be confidential and legally privileged
and are sent solely for the attention of the addressee(s). If you have received 
this
email in error, please delete it from your system: its use, disclosure or 
copying is
unauthorised. Statements and opinions expressed in this email may not represent
those of Virgin Media. Any representations or commitments in this email are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, 
RG27 9UP
Registered in England and Wales with number 2591237



--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail?

Visit www.virginmedia.com for more information, and more fun.

This email and any attachments are or may be confidential and legally privileged
and are sent solely for the attention of the addressee(s). If you have received 
this
email in error, please delete it from your system: its use, disclosure or 
copying is
unauthorised. Statements and opinions expressed in this email may not represent
those of Virgin Media. Any representations or commitments in this email are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, 
RG27 9UP
Registered in England and Wales with number 2591237


--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail?

Visit www.virginmedia.com for more information, and more fun.

This email and any attachments are or may be confidential and legally privileged
and are sent solely for the attention of the addressee(s). If you have received 
this
email in error, please delete it from your system: its use, disclosure or 
copying is
unauthorised. Statements and opinions expressed in this email may not represent
those of Virgin Media. Any representations or commitments in this email are
subject to contract. 

Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, 
RG27 9UP
Registered in England and Wales with number 2591237

Reply via email to