1.  Num of open files is configurable. See hdfs.maxOpenFiles. May be better 
to setup your flume source it in way that it allows the HDFS sink to work on a 
smaller number set of files
  2.  If I recall correctly.. it writes the events out immediately and doesn't 
buffer. Some bufferring is surelly happening in the hdfs client libs. Beyond 
that mostly should be book-keeping info (open file handles etc) and any mem 
used in compression (like a block per open file, if using block compression). 
Best to measure it with a test setup. See how the 'in-use' mem consumption 
differs by when 1 file open, 100 files open, 1000 files open.
  3.  No. Assuming you are using file channel, then you can try starting from 
say 8GB as the max heap size for the agent, and go from there. Mem consumption 
of  Memory/Spillable channels depend on the  their memory capactiy settings.

-roshan

From: R P <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 9, 2016 at 11:17 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Flume HDFS sink memory requierment.


Hello All,

  Hope you all are having great time. Thanks for reading my question, I 
appreciate any suggestion/reply.


I am evaluating flume for HDFS write. We get sparse data which will be bucketed 
into thousands of different logs. As this data is received sporadically through 
out the day we get into HDFS small files problem.


To address this problem one solution is to use file size as the only condition 
for file close using hdfs.rollSize.  As we might have thousands of files open 
for hours I have following questions.


1. Will flume keep thousands of files open until hdfs.rollSize condition is met?

2. How much memory is used by HDFS sink when thousands of files are open at a 
time?

3. Is memory used for HDFS event buffer equal to data written on HDFS? e.g if 
thousands of files to be written has total size of 500gb, will flume sink need 
500gb memory size?


Thanks again for your input.


-R

Reply via email to