Hi Andy, I mentioned more a own program / script to parse the data (instead tail -*) to have some control about the contents. Note, when a flume agent will be restarted, the marker for tail will be lost too. This comes from tail itself, flume hasn't a control about.
- Alex On Feb 4, 2013, at 8:33 AM, 周梦想 <[email protected]> wrote: > Hi Alex, > > You mean I write a script to check the directories? > [zhouhh@Hadoop46 ag1]$ pwd > /tmp/flume-zhouhh/agent/ag1 > [zhouhh@Hadoop46 ag1]$ ls > dfo_error dfo_import dfo_logged dfo_sending dfo_writing done error > import logged sending sent writing > > how to check to avoid lost data and disable resend data ? clean sending dir? > > thanks! > Andy > > 2013/1/29 Alexander Alten-Lorenz <[email protected]> > >> Hi, >> >> you could use tail -F, but this depends on the external source. Flume >> hasn't control about. You can write your own script and include this. >> >> What's the content of: >> /tmp/flume/agent/agent*.*/ - directories? Are sent and sending clean? >> >> - Alex >> >> On Jan 29, 2013, at 8:24 AM, 周梦想 <[email protected]> wrote: >> >>> hello, >>> 1. I want to tail a log source and write it to hdfs. below is configure: >>> config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true), >>> agentDFOSink("hadoop48",35853) ;] >>> config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true), >>> agentDFOSink("hadoop48",35853) ;] >>> config [co1, collectorSource( 35853 ), [collectorSink( >>> >> "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( >>> "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] >>> >>> >>> I found if I restart the agent node, it will resend the content of >> game.log >>> to collector. There are some solutions to send logs from where I haven't >>> sent before? Or I have to make a mark myself or remove the logs manually >>> when restart the agent node? >>> >>> 2. I tested performance of flume, and found it's a bit slow. >>> if I using configure as above, there are only 50MB/minute. >>> I changed the configure to below: >>> ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip >>> agentDFOSink("hadoop48",35853); >>> >>> config [co1, collectorSource( 35853 ), [collectorSink( >>> >> "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( >>> "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]] >>> >>> I sent 300MB log, it will spent about 3 minutes, so it's about >> 100MB/minute. >>> >>> while I send the log from ag1 to co1 via scp, It's about 30MB/second. >>> >>> someone give me any ideas? >>> >>> thanks! >>> >>> Andy >> >> -- >> Alexander Alten-Lorenz >> http://mapredit.blogspot.com >> German Hadoop LinkedIn Group: http://goo.gl/N8pCF >> >> -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
