Hi Alex, You mean I write a script to check the directories? [zhouhh@Hadoop46 ag1]$ pwd /tmp/flume-zhouhh/agent/ag1 [zhouhh@Hadoop46 ag1]$ ls dfo_error dfo_import dfo_logged dfo_sending dfo_writing done error import logged sending sent writing
how to check to avoid lost data and disable resend data ? clean sending dir? thanks! Andy 2013/1/29 Alexander Alten-Lorenz <[email protected]> > Hi, > > you could use tail -F, but this depends on the external source. Flume > hasn't control about. You can write your own script and include this. > > What's the content of: > /tmp/flume/agent/agent*.*/ - directories? Are sent and sending clean? > > - Alex > > On Jan 29, 2013, at 8:24 AM, 周梦想 <[email protected]> wrote: > > > hello, > > 1. I want to tail a log source and write it to hdfs. below is configure: > > config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true), > > agentDFOSink("hadoop48",35853) ;] > > config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true), > > agentDFOSink("hadoop48",35853) ;] > > config [co1, collectorSource( 35853 ), [collectorSink( > > > "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( > > "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] > > > > > > I found if I restart the agent node, it will resend the content of > game.log > > to collector. There are some solutions to send logs from where I haven't > > sent before? Or I have to make a mark myself or remove the logs manually > > when restart the agent node? > > > > 2. I tested performance of flume, and found it's a bit slow. > > if I using configure as above, there are only 50MB/minute. > > I changed the configure to below: > > ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip > > agentDFOSink("hadoop48",35853); > > > > config [co1, collectorSource( 35853 ), [collectorSink( > > > "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( > > "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] > > > > I sent 300MB log, it will spent about 3 minutes, so it's about > 100MB/minute. > > > > while I send the log from ag1 to co1 via scp, It's about 30MB/second. > > > > someone give me any ideas? > > > > thanks! > > > > Andy > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > >
