Hi, you could use tail -F, but this depends on the external source. Flume hasn't control about. You can write your own script and include this.
What's the content of: /tmp/flume/agent/agent*.*/ - directories? Are sent and sending clean? - Alex On Jan 29, 2013, at 8:24 AM, 周梦想 <[email protected]> wrote: > hello, > 1. I want to tail a log source and write it to hdfs. below is configure: > config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true), > agentDFOSink("hadoop48",35853) ;] > config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true), > agentDFOSink("hadoop48",35853) ;] > config [co1, collectorSource( 35853 ), [collectorSink( > "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( > "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] > > > I found if I restart the agent node, it will resend the content of game.log > to collector. There are some solutions to send logs from where I haven't > sent before? Or I have to make a mark myself or remove the logs manually > when restart the agent node? > > 2. I tested performance of flume, and found it's a bit slow. > if I using configure as above, there are only 50MB/minute. > I changed the configure to below: > ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip > agentDFOSink("hadoop48",35853); > > config [co1, collectorSource( 35853 ), [collectorSink( > "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( > "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] > > I sent 300MB log, it will spent about 3 minutes, so it's about 100MB/minute. > > while I send the log from ag1 to co1 via scp, It's about 30MB/second. > > someone give me any ideas? > > thanks! > > Andy -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
