答复: why lots of tmp files in hdfs

武汉_技术部_搜索与精准化_万毅 Thu, 04 Sep 2014 18:42:06 -0700

@ Anandkumar Lakshmanan

Thanks for your reply,


I originally thought the idleTimeout was in millisecond as the callTimeout 
property.
I will try to change the idleTimeout.





Best Regards

Wayne Wan


发件人: Anandkumar Lakshmanan [mailto:[email protected]]
发送时间: 2014年9月4日 12:59
收件人: [email protected]
主题: Re: why lots of tmp files in hdfs

Hi,

You can decide the file size to be stored in HDFS by using the following 
properties:

* hdfs.rollInterval ---> Number of seconds to wait before rolling current 
file(0 = never roll based on time interval) and Default value is 30 seconds.

* hdfs.rollSize ---> File size to trigger roll, in bytes (0: never roll based 
on file size) and Default value is 1024bytes.

* hdfs.rollCount ---> Number of events written to file before it rolled (0 = 
never roll based on number of events) and Default value is 10.

We have to specify based on "file size" or "number of events in a file" or 
"number of seconds to wait to roll the file".

In your configuration you specified as  "rollInterval = 300", i.e 300 
seconds(5minutes) to wait before rolling the current file.


* idleTimeout ---> Timeout after which inactive files get closed (0 = disable 
automatic closing of idle files).

Also, you specified "idleTimeout = 1800000"(3000 minutes, the file will roll 
only after 3000 minutes of inactive state). This is the reason why you are 
getting all the files with .tmp state.
Reduce this value to 30 or 60 seconds then it will work well.

Thanks
Anand.



On 09/04/2014 09:09 AM, Wan Yi(武汉_技术部_搜索与精准化_万毅) wrote:
Hi, all
         I am using hdfs sink to store logs, I saw lots of tmp files(more than 
10 ) in hdfs , Can anybody know why ?

Below is my hdfs configurations

Our hadoop version is : Hadoop 2.3.0-cdh5.0.2
Flume version is : 1.4.0

a1.sinks.sinks1.type = hdfs
a1.sinks.sinks1.channel = ch1
a1.sinks.sinks1.hdfs.path = hdfs://xxxxxxx
a1.sinks.sinks1.hdfs.filePrefix = events
a1.sinks.sinks1.hdfs.batchSize = 1000
a1.sinks.sinks1.hdfs.rollCount = 0
a1.sinks.sinks1.hdfs.rollSize = 0
a1.sinks.sinks1.hdfs.rollInterval = 300
a1.sinks.sinks1.hdfs.idleTimeout = 1800000
a1.sinks.sinks1.hdfs.callTimeout = 180000
a1.sinks.sinks1.hdfs.threadsPoolSize = 250
a1.sinks.sinks1.hdfs.writeFormat = Text
a1.sinks.sinks1.hdfs.fileType = DataStream




Best Regards

Wayne Wan



Best Regards
万毅(Wayne Wan)
Dev@个 性精准化&无线部
[说明: ad-dolphin]


________________________________

* Email: [email protected]<mailto:[email protected]>

* Cell: +86.1387.1388.731

* Addr: 8/F, Building F6, Optics Valley Software Park, Guanshan Avenue, Wuhan, 
China. 430074

________________________________

答复: why lots of tmp files in hdfs

Reply via email to