@ Anandkumar Lakshmanan Thanks for your reply,
I originally thought the idleTimeout was in millisecond as the callTimeout property. I will try to change the idleTimeout. Best Regards Wayne Wan 发件人: Anandkumar Lakshmanan [mailto:[email protected]] 发送时间: 2014年9月4日 12:59 收件人: [email protected] 主题: Re: why lots of tmp files in hdfs Hi, You can decide the file size to be stored in HDFS by using the following properties: * hdfs.rollInterval ---> Number of seconds to wait before rolling current file(0 = never roll based on time interval) and Default value is 30 seconds. * hdfs.rollSize ---> File size to trigger roll, in bytes (0: never roll based on file size) and Default value is 1024bytes. * hdfs.rollCount ---> Number of events written to file before it rolled (0 = never roll based on number of events) and Default value is 10. We have to specify based on "file size" or "number of events in a file" or "number of seconds to wait to roll the file". In your configuration you specified as "rollInterval = 300", i.e 300 seconds(5minutes) to wait before rolling the current file. * idleTimeout ---> Timeout after which inactive files get closed (0 = disable automatic closing of idle files). Also, you specified "idleTimeout = 1800000"(3000 minutes, the file will roll only after 3000 minutes of inactive state). This is the reason why you are getting all the files with .tmp state. Reduce this value to 30 or 60 seconds then it will work well. Thanks Anand. On 09/04/2014 09:09 AM, Wan Yi(武汉_技术部_搜索与精准化_万毅) wrote: Hi, all I am using hdfs sink to store logs, I saw lots of tmp files(more than 10 ) in hdfs , Can anybody know why ? Below is my hdfs configurations Our hadoop version is : Hadoop 2.3.0-cdh5.0.2 Flume version is : 1.4.0 a1.sinks.sinks1.type = hdfs a1.sinks.sinks1.channel = ch1 a1.sinks.sinks1.hdfs.path = hdfs://xxxxxxx a1.sinks.sinks1.hdfs.filePrefix = events a1.sinks.sinks1.hdfs.batchSize = 1000 a1.sinks.sinks1.hdfs.rollCount = 0 a1.sinks.sinks1.hdfs.rollSize = 0 a1.sinks.sinks1.hdfs.rollInterval = 300 a1.sinks.sinks1.hdfs.idleTimeout = 1800000 a1.sinks.sinks1.hdfs.callTimeout = 180000 a1.sinks.sinks1.hdfs.threadsPoolSize = 250 a1.sinks.sinks1.hdfs.writeFormat = Text a1.sinks.sinks1.hdfs.fileType = DataStream Best Regards Wayne Wan Best Regards 万毅(Wayne Wan) Dev@个 性精准化&无线部 [说明: ad-dolphin] ________________________________ * Email: [email protected]<mailto:[email protected]> * Cell: +86.1387.1388.731 * Addr: 8/F, Building F6, Optics Valley Software Park, Guanshan Avenue, Wuhan, China. 430074 ________________________________
