Hi Needham , Thanks for your response . If this is case then i facing the data lose . for example I sent 5129 event to flume and i configured agent for file size should be 1MB ( contain 4890 events) then only it roll out .I have one completely roll out log file of size 1MB . another file in is in process of writing so when stop HDFS it also stop flume .if i open that that second log file it does not contain the event i sent it contain below text java.io.IOException: Got error for OP_READ_BLOCK, self=/127.0.0.1:9694, remote=127.0.0.1/127.0.0.1:50010, for file /127.0.0.1:50010:BP-1861801959-172.16.1.123-1413351856783:1073743344, for pool BP-1861801959-172.16.1.123-1413351856783 block 1073743344_2532
now i start agent again .now it read the check point directory and move the missed event to HDFS but it moved only last two events(5128,5129) . event between 4891 to 5127 were completely missing why it is happening . how i prevent data lose in this case? Regards,Mahendran From: [email protected] To: [email protected]; [email protected] Subject: RE: Shutdowning HDFS server leads to flume agent shutdown Date: Fri, 7 Nov 2014 09:36:33 +0000 Hi Mahendran, yes that is expected behaviour - I suspect that if you look in the logs for this agent, it will have thrown an exception when you shut down the HDFS, as it is depending on a compatible HDFS being available. Regards, Guy Needham | Data Discovery Virgin Media | Enterprise Data, Design & Management Bartley Wood Business Park, Hook, Hampshire RG27 9UP D 01256 75 3362 I welcome VSRE emails. Learn more at http://vsre.info/ From: mahendran m [mailto:[email protected]] Sent: 07 November 2014 09:33 To: [email protected] Subject: Shutdowning HDFS server leads to flume agent shutdown Hi All, I am new to Apache flume . I have configured using thrift source to send log to HDFS my Config as below # list sources, sinks and channels in the agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # avro sink properties a1.sources.r1.type = thrift a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = DecoderInterceptor.CustomInterceptor$Builder a1.sources.r1.interceptors.i1.referalUrl=referalUrl a1.sources.r1.interceptors.i1.referalHost=referalHost #HDFS sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.fileSuffix= .txt a1.sinks.k1.hdfs.rollSize = 1048576 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.batchSize = 1000 a1.sinks.k1.hdfs.minBlockReplicas = 1 a1.sinks.k1.hdfs.callTimeout = 60000 a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flumeChannel100/Thrift # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 10000000 a1.channels.c1.transactionCapacity = 1000 a1.channels.c1.byteCapacityBufferPercentage = 10 a1.channels.c1.byteCapacity = 5368709120 # define the flow a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 when i stated HDFS, flume service and generated the logs from c# application . my logs are moved to HDFS everything OK still now. but when stopped HDFS service . flume agent itself get stopped. Is this default behavior ? . or any wen wrong . Regards, Mahendran -------------------------------------------------------------------- Save Paper - Do you really need to print this e-mail? Visit www.virginmedia.com for more information, and more fun. This email and any attachments are or may be confidential and legally privileged and are sent solely for the attention of the addressee(s). If you have received this email in error, please delete it from your system: its use, disclosure or copying is unauthorised. Statements and opinions expressed in this email may not represent those of Virgin Media. Any representations or commitments in this email are subject to contract. Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP Registered in England and Wales with number 2591237
