Hi, You can decide the file size to be stored in HDFS by using the following properties:
* hdfs.rollInterval ---> Number of seconds to wait before rolling current file(0 = never roll based on time interval) and Default value is 30 seconds. * hdfs.rollSize ---> File size to trigger roll, in bytes (0: never roll based on file size) and Default value is 1024bytes. * hdfs.rollCount ---> Number of events written to file before it rolled (0 = never roll based on number of events) and Default value is 10. We have to specify based on "file size" or "number of events in a file" or "number of seconds to wait to roll the file". In your configuration you specified as "*rollInterval = 300*", i.e 300 seconds(5minutes) to wait before rolling the current file. * idleTimeout ---> Timeout after which inactive files get closed (0 = disable automatic closing of idle files). Also, you specified "*idleTimeout = **1800000*"*(3000 minutes, the file will roll only after 3000 minutes of inactive state)*. This is the reason why you are getting all the files with*.tmp state*. Reduce this value to 30 or 60 seconds then it will work well. Thanks Anand. On 09/04/2014 09:09 AM, Wan Yi(武汉_技术部_搜索与精准化_万毅) wrote: > > Hi, all > > I am using hdfs sink to store logs, I saw lots of tmp files(more than > 10 ) in hdfs , Can anybody know why ? > > Below is my hdfs configurations > > Our hadoop version is : Hadoop 2.3.0-cdh5.0.2 > > Flume version is : 1.4.0 > > a1.sinks.sinks1.type = hdfs > > a1.sinks.sinks1.channel = ch1 > > a1.sinks.sinks1.hdfs.path = hdfs://xxxxxxx > > a1.sinks.sinks1.hdfs.filePrefix = events > > a1.sinks.sinks1.hdfs.batchSize = 1000 > > a1.sinks.sinks1.hdfs.rollCount = 0 > > a1.sinks.sinks1.hdfs.rollSize = 0 > > a1.sinks.sinks1.hdfs.rollInterval = 300 > > a1.sinks.sinks1.hdfs.idleTimeout = 1800000 > > a1.sinks.sinks1.hdfs.callTimeout = 180000 > > a1.sinks.sinks1.hdfs.threadsPoolSize = 250 > > a1.sinks.sinks1.hdfs.writeFormat = Text > > a1.sinks.sinks1.hdfs.fileType = DataStream > > *Best Regards* > > *Wayne Wan* > > > > *Best Regards* > > *万毅**(Wayne Wan) > **Dev*@*个 性精准化&无线部 > **说明: ad-dolphin*** > > > > ------------------------------------------------------------------------ > > +*Email:*[email protected] <mailto:[email protected]> > > (*Cell:*+86.1387.1388.731 > > **Addr:*8/F, Building F6, Optics Valley Software Park, Guanshan > Avenue, Wuhan, China. 430074 > > ------------------------------------------------------------------------ >
