Ahmed, Thanks for your details comments.
Final point, in which cases these logging solution will be considered as a perfect system without any tradeoffs, On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <[email protected]> wrote: > Exactly up to the point. > > > > > On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <[email protected]> > wrote: > >> That was a good point. >> >> So if a solution mention as guarantee data delivery , it specifies that >> only in the case when the event flows into the source/producers >> successfully by application and then from that point the system guarantee >> the event delivery till other end sink/consumer. >> >> It has no control over the proper flow of event reaching the >> source/producer.(like data loss) >> >> So there always be chances of data loss when the system goes down , where >> certain tradeoff measures to be taken. >> >> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <[email protected]> wrote: >> >>> Hi, >>> >>> Flume, Kafka, or any other system can only be responsible for it's own >>> actions. Looking from the perspective of the exec source in Flume - it >>> requests from the bash to give him an output from his stout. It cannot >>> control what bash will return. >>> Thus, it's not a file to him, but just a stream of text. >>> >>> When spooling directory source is in question, it will resume from the >>> file it failed with. >>> That reveals two approaches to event consumption: push and pull. >>> >>> When push approach is used then it cannot be aware of what comes next >>> and what was before it started to listen. >>> >>> Even so, some sources/producers, even they use pull approach, doesn't >>> have to know how to return to the last read event. It's up to >>> implementation. >>> >>> Regards, >>> Ahmed >>> >>> >>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR < >>> [email protected]> wrote: >>> >>>> yes , I agree . >>>> >>>> I think no logging solution like source in flume/producer in kafka >>>> have any marking feature like exact point till it consumed from logfile , >>>> to recover incase of its failure to again start reading from the same >>>> point of the logfile.(before failure) >>>> >>>> This is the major point where failures were difficult to ignore.Am I >>>> right? >>>> >>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> You can use spillable channel that will store events in memory and >>>>> once it fills it, it will spill to the disk. >>>>> Also, you can use file channel, but it's as fast as your disk is and >>>>> it's suggested to use a separate disk for it due to high IO with it, >>>>> preferably an SSD. >>>>> >>>>> But, that will not solve the issue you might run into - if the flume >>>>> fails for whatever the reason, you'll never be able to continue from the >>>>> exact point where it failed. >>>>> Yes, File channel preserves the state, so it will continue with >>>>> whatever he already received, but what about the time while it was down ? >>>>> >>>>> If you cannot change anything regarding the application that produces >>>>> the logs, then such circumstance has to be taken as a trade off. >>>>> >>>>> >>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR < >>>>> [email protected]> wrote: >>>>> >>>>>> Yes I understand the concerns with this use case. >>>>>> >>>>>> If so we need to configure failover in this scenario , can we have it >>>>>> like channel level ,sink channel. >>>>>> >>>>>> Does flume support to configure failover incase channel fills up. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In fact, this is not the problem with Flume. >>>>>>> >>>>>>> No solution will function reliably for your use case, simply because >>>>>>> all of them will have to do some sort of tail-f or streaming on a file >>>>>>> and >>>>>>> if they can't keep up with it (they mostly don't in high speed entry >>>>>>> points), they will drop some entries. >>>>>>> Please, be kind to yourself and plan for failures - if you need to >>>>>>> restart Flume or any other solution then you'll face dropped entries >>>>>>> that >>>>>>> you'll not be able to re-ingest easily as in most cases you won't know >>>>>>> which ones you've dropped. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Ahmed >>>>>>> >>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Thanks for comments Ahmed. >>>>>>>> >>>>>>>> So from your comments , I consider that flume doesn't have any >>>>>>>> reliable source option for use case provided by me. >>>>>>>> >>>>>>>> If flume can't provide it, can you help me with any other log >>>>>>>> collector solutions which can I consider here to move real time data to >>>>>>>> HDFS. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Then, you're out of luck in my opinion, as there is no way other >>>>>>>>> than tail -f. >>>>>>>>> The problem with fail-f is that tail will not wait for >>>>>>>>> source/channel to keep up with it. If Cnannel is full it will >>>>>>>>> back-off to >>>>>>>>> the source and then the source will just stop ingesting. >>>>>>>>> >>>>>>>>> There is a possibility to hack up the tail -f into another file >>>>>>>>> and then custom-rotate that duplicate file. >>>>>>>>> But, I wouldn't recommend such case. >>>>>>>>> >>>>>>>>> Just a side note - If you're operating Java application (Tomcat or >>>>>>>>> similar), then you can create multiple output files via >>>>>>>>> log4j.properties >>>>>>>>> configuration without application itself knowing anything about it. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Ahmed >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Ahmed, >>>>>>>>>> >>>>>>>>>> Here in my case , the application will rename the existing file >>>>>>>>>> as <logfile>.yesterdaydate and create a new file as <logfile> at >>>>>>>>>> 00:00 AM. >>>>>>>>>> >>>>>>>>>> I can't change the log rotation policy of application for now.So >>>>>>>>>> I guess I should rule out the option of using spooling directory >>>>>>>>>> source in >>>>>>>>>> my case. >>>>>>>>>> >>>>>>>>>> Can you suggest me with any other options other than spooling dir >>>>>>>>>> source. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> It all depends on how log rotation is done and how application >>>>>>>>>>> producing the log file handles log rotation. >>>>>>>>>>> Most of the applications just reopens the log file when it >>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file >>>>>>>>>>> when it >>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some >>>>>>>>>>> applications >>>>>>>>>>> might restart as a result. >>>>>>>>>>> >>>>>>>>>>> If the application just reopens the log file, then you can >>>>>>>>>>> change your log rotation policy to be per minute. >>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll >>>>>>>>>>> have to make a cron job to do it. >>>>>>>>>>> In such case, you would separate finished logs location and live >>>>>>>>>>> log location so the spooling directory source doesn't freak out >>>>>>>>>>> about >>>>>>>>>>> active log file being appended. >>>>>>>>>>> >>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will >>>>>>>>>>> leave log files in place, just renamed. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Ahmed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here. >>>>>>>>>>>> >>>>>>>>>>>> Source:exec , tail –F command for a logfile. >>>>>>>>>>>> >>>>>>>>>>>> Channel: file channel >>>>>>>>>>>> >>>>>>>>>>>> Sink: HDFS >>>>>>>>>>>> >>>>>>>>>>>> Use case:to move real time data from logfile to HDFS. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It appears like exec is not a reliable source , as we may data >>>>>>>>>>>> loss if channel/source is down. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So i tried with other option "spooling directory source" which >>>>>>>>>>>> is mentioned as reliable source.But here I have a single logfile >>>>>>>>>>>> where data >>>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool >>>>>>>>>>>> directory. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Can anyone help me with providing any other reliable source >>>>>>>>>>>> option in case where logfile gets appended with data and logfile >>>>>>>>>>>> rotation >>>>>>>>>>>> happens only at the end of the day. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Saravana >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> --------- >>>>>>>>>>> This e-mail and any attachment is for authorised use by the >>>>>>>>>>> intended recipient(s) only. This email contains confidential >>>>>>>>>>> information. >>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any >>>>>>>>>>> party other >>>>>>>>>>> than the intended recipient. Any unauthorised distribution, >>>>>>>>>>> dissemination >>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any >>>>>>>>>>> information contained in them, is strictly prohibited and may be >>>>>>>>>>> illegal. >>>>>>>>>>> If you are not an intended recipient then please promptly delete >>>>>>>>>>> this >>>>>>>>>>> e-mail and any attachment and all copies and inform the sender >>>>>>>>>>> directly via >>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or >>>>>>>>>>> persons other than the named communicant for the purposes of >>>>>>>>>>> ascertaining >>>>>>>>>>> whether the communication complies with the law and company >>>>>>>>>>> policies. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> --------- >>>>>>>>> This e-mail and any attachment is for authorised use by the >>>>>>>>> intended recipient(s) only. This email contains confidential >>>>>>>>> information. >>>>>>>>> It should not be copied, disclosed to, retained or used by, any party >>>>>>>>> other >>>>>>>>> than the intended recipient. Any unauthorised distribution, >>>>>>>>> dissemination >>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any >>>>>>>>> information contained in them, is strictly prohibited and may be >>>>>>>>> illegal. >>>>>>>>> If you are not an intended recipient then please promptly delete this >>>>>>>>> e-mail and any attachment and all copies and inform the sender >>>>>>>>> directly via >>>>>>>>> email. Any emails that you send to us may be monitored by systems or >>>>>>>>> persons other than the named communicant for the purposes of >>>>>>>>> ascertaining >>>>>>>>> whether the communication complies with the law and company policies. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> This e-mail and any attachment is for authorised use by the intended >>>>>>> recipient(s) only. This email contains confidential information. It >>>>>>> should >>>>>>> not be copied, disclosed to, retained or used by, any party other than >>>>>>> the >>>>>>> intended recipient. Any unauthorised distribution, dissemination or >>>>>>> copying >>>>>>> of this E-mail or its attachments, and/or any use of any information >>>>>>> contained in them, is strictly prohibited and may be illegal. If you are >>>>>>> not an intended recipient then please promptly delete this e-mail and >>>>>>> any >>>>>>> attachment and all copies and inform the sender directly via email. Any >>>>>>> emails that you send to us may be monitored by systems or persons other >>>>>>> than the named communicant for the purposes of ascertaining whether the >>>>>>> communication complies with the law and company policies. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best regards, >>>>> Ahmed Vila | Senior software developer >>>>> DevLogic | Sarajevo | Bosnia and Herzegovina >>>>> >>>>> Office : +387 33 942 123 >>>>> Mobile: +387 62 139 348 >>>>> >>>>> Website: www.devlogic.eu >>>>> E-mail : [email protected] >>>>> --------------------------------------------------------------------- >>>>> This e-mail and any attachment is for authorised use by the intended >>>>> recipient(s) only. This email contains confidential information. It should >>>>> not be copied, disclosed to, retained or used by, any party other than the >>>>> intended recipient. Any unauthorised distribution, dissemination or >>>>> copying >>>>> of this E-mail or its attachments, and/or any use of any information >>>>> contained in them, is strictly prohibited and may be illegal. If you are >>>>> not an intended recipient then please promptly delete this e-mail and any >>>>> attachment and all copies and inform the sender directly via email. Any >>>>> emails that you send to us may be monitored by systems or persons other >>>>> than the named communicant for the purposes of ascertaining whether the >>>>> communication complies with the law and company policies. >>>>> >>>>> --------------------------------------------------------------------- >>>>> This e-mail and any attachment is for authorised use by the intended >>>>> recipient(s) only. This email contains confidential information. It should >>>>> not be copied, disclosed to, retained or used by, any party other than the >>>>> intended recipient. Any unauthorised distribution, dissemination or >>>>> copying >>>>> of this E-mail or its attachments, and/or any use of any information >>>>> contained in them, is strictly prohibited and may be illegal. If you are >>>>> not an intended recipient then please promptly delete this e-mail and any >>>>> attachment and all copies and inform the sender directly via email. Any >>>>> emails that you send to us may be monitored by systems or persons other >>>>> than the named communicant for the purposes of ascertaining whether the >>>>> communication complies with the law and company policies. >>>>> >>>> >>> >>> --------------------------------------------------------------------- >>> This e-mail and any attachment is for authorised use by the intended >>> recipient(s) only. This email contains confidential information. It should >>> not be copied, disclosed to, retained or used by, any party other than the >>> intended recipient. Any unauthorised distribution, dissemination or copying >>> of this E-mail or its attachments, and/or any use of any information >>> contained in them, is strictly prohibited and may be illegal. If you are >>> not an intended recipient then please promptly delete this e-mail and any >>> attachment and all copies and inform the sender directly via email. Any >>> emails that you send to us may be monitored by systems or persons other >>> than the named communicant for the purposes of ascertaining whether the >>> communication complies with the law and company policies. >>> >> > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. >
