The patch has been tested and uploaded. This should fix flume1.4 and before. https://issues.apache.org/jira/browse/FLUME-1654
Cheers, Suhas. On Wed, Oct 16, 2013 at 5:15 PM, Suhas Satish <[email protected]>wrote: > There already exists a JIRA. I have come up with a local fix which works. > https://issues.apache.org/jira/browse/FLUME-1654 > > Will be uploading a patch soon. > > Cheers, > Suhas. > > > On Tue, Oct 15, 2013 at 1:15 PM, Roshan Naik <[email protected]>wrote: > >> Paul, >> HDFS sink issue apart... it sounds like this is a setup where Hive s >> being allowed to read through new files/directories flowing into the >> partition while HDFS sink is still writing to it. To my knowledge, in Hive, >> a partition is considered immutable and it should not be updated once the >> partition is created. So only once the HDFS sink has rolled over to the >> next directory, the previous directory should be exposed to Hive. >> -roshan >> >> >> On Tue, Oct 15, 2013 at 11:23 AM, Paul Chavez < >> [email protected]> wrote: >> >>> I can’t speak for Suhas, but I face a similar issue in production. For >>> me it occurs when someone queries a .tmp file from Hive or Pig. This causes >>> the HDFS sink to lose the ability to close and rename the file and then the >>> HDFS sink is completely out of commission until the agent is restarted. >>> We’ve mitigated this in our environment by careful Hive partition >>> coordination but it still crops up in cases where people are running ad-hoc >>> queries they probably shouldn’t be. We are waiting to get the latest CDH in >>> production which eliminates the .tmp file issue but I would still like to >>> have a more resilient HDFS sink and so I support development effort in this >>> area. >>> >>> >>> >>> Thanks, >>> >>> Paul Chavez >>> >>> >>> >>> >>> >>> *From:* Roshan Naik [mailto:[email protected]] >>> *Sent:* Tuesday, October 15, 2013 11:14 AM >>> *To:* [email protected] >>> *Cc:* [email protected]; [email protected] >>> *Subject:* Re: flume agent with HDFS sink, syslog source and memory >>> channel - stuck on hdfs IOException >>> >>> >>> >>> sounds like a valid bug. i am curious though... is there a use real use >>> scenario you are facing in production ? >>> >>> >>> >>> On Mon, Oct 14, 2013 at 7:39 PM, Suhas Satish <[email protected]> >>> wrote: >>> >>> In summary, although the flume-agent JVM doesnt exit, once a HDFS IO >>> exception >>> occurs due to deleting a .tmp file, the agent doesn't recover from this >>> to log >>> other hdfs sink outputs generated by syslog source. >>> >>> There was only 1 JIRA remotely related to this HDFS sink issue I found in >>> Apache which we didn't have. I tested by pulling-in jira patch >>> FLUME-2007 into flume-1.4.0. >>> >>> https://github.com/apache/flume/commit/5b5470bd5d3e94842032009c36788d4ae346674bhttps://issues.apache.org/jira/browse/FLUME-2007<https://github.com/apache/flume/commit/5b5470bd5d3e94842032009c36788d4ae346674bhttps:/issues.apache.org/jira/browse/FLUME-2007> >>> >>> But it doesn't solve this issue. >>> >>> Should I open a new jira ticket? >>> >>> >>> >>> Thanks, >>> Suhas. >>> >>> >>> On Fri, Oct 11, 2013 at 4:13 PM, Suhas Satish <[email protected] >>> >wrote: >>> >>> >>> > Hi I have the following flume configuration file flume-syslog.conf >>> > (attached) - >>> > >>> > 1.) I laucnh it with - >>> > >>> > bin/flume-ng agent -n agent -c conf -f conf/flume-syslog.conf >>> > >>> > 2.) Generate log output using loggen (provided by syslog-ng): >>> > loggen -I 30 -s 300 -r 900 localhost 13073 >>> > >>> > 3.) I verify flume output is generated under /flume_import/ on hadoop >>> cluster. >>> > >>> > It generates output of the form - >>> > >>> > -rwxr-xr-x 3 root root 139235 2013-10-11 14:35 >>> > /flume_import/2013/10/14/logdata-2013-10-14-35-45.1381527345384.tmp >>> > -rwxr-xr-x 3 root root 138095 2013-10-11 14:35 >>> > /flume_import/2013/10/14/logdata-2013-10-14-35-46.1381527346543.tmp >>> > -rwxr-xr-x 3 root root 135795 2013-10-11 14:35 >>> > /flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp >>> > >>> > >>> > 4.) Delete the flume output files while loggen is still running and >>> Flume is >>> > generating the sink output. >>> > >>> > hadoop fs -rmr >>> /flume_import/2013/10/14/logdata-2013-10-14-35-47.1381527347670.tmp >>> > >>> > 5. )This gives me the following exception in the flume log. Although >>> the flume agent JVM continues to run, it does not generate any more output >>> files from syslog-ng until the flume agent JVM is restarted. Is flume >>> expected to behave like this or should it handle IOException gracefully >>> and continue to log output of syslog to other output directories? >>> > >>> > 10 Oct 2013 16:55:42,092 WARN >>> [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> > (org.apache.flume.sink.hdfs.BucketWriter.append:430) - Caught >>> IOException >>> > while closing file >>> > >>> (maprfs:///flume_import/2013/10/16//logdata-2013-10-16-50-03.1381449008596.tmp). >>> > Exception follows. >>> > java.io.IOException: 2049.112.5249612 >>> > /flume_import/2013/10/16/logdata-2013-10-16-50-03.1381449008596.tmp >>> (Stale file >>> > handle) >>> > at com.mapr.fs.Inode.throwIfFailed(Inode.java:269) >>> > at com.mapr.fs.Inode.flushJniBuffers(Inode.java:402) >>> > at com.mapr.fs.Inode.syncInternal(Inode.java:478) >>> > at com.mapr.fs.Inode.syncUpto(Inode.java:484) >>> > at com.mapr.fs.MapRFsOutStream.sync(MapRFsOutStream.java:244) >>> > at >>> com.mapr.fs.MapRFsDataOutputStream.sync(MapRFsDataOutputStream.java:68) >>> > at >>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:946) >>> > at >>> > >>> org.apache.flume.sink.hdfs.HDFSSequenceFile.sync(HDFSSequenceFile.java:107) >>> > at >>> org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356) >>> > at >>> org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353) >>> > at >>> org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536) >>> > at >>> > >>> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160) >>> > at >>> > >>> org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56) >>> > at >>> org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533) >>> > at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> > at >>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(T >>> > >>> > >>> > 6.) I found the following related post >>> > >>> > >>> http://mail-archives.apache.org/mod_mbox/flume-user/201305.mbox/%[email protected]%3E >>> > >>> > Not sure if its related to this issue. Can anyone comment? >>> > >>> > Thanks, >>> > Suhas. >>> > >>> >>> >>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >>> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > >
