That was a good point. So if a solution mention as guarantee data delivery , it specifies that only in the case when the event flows into the source/producers successfully by application and then from that point the system guarantee the event delivery till other end sink/consumer.
It has no control over the proper flow of event reaching the source/producer.(like data loss) So there always be chances of data loss when the system goes down , where certain tradeoff measures to be taken. On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <[email protected]> wrote: > Hi, > > Flume, Kafka, or any other system can only be responsible for it's own > actions. Looking from the perspective of the exec source in Flume - it > requests from the bash to give him an output from his stout. It cannot > control what bash will return. > Thus, it's not a file to him, but just a stream of text. > > When spooling directory source is in question, it will resume from the > file it failed with. > That reveals two approaches to event consumption: push and pull. > > When push approach is used then it cannot be aware of what comes next and > what was before it started to listen. > > Even so, some sources/producers, even they use pull approach, doesn't have > to know how to return to the last read event. It's up to implementation. > > Regards, > Ahmed > > > On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <[email protected] > > wrote: > >> yes , I agree . >> >> I think no logging solution like source in flume/producer in kafka have >> any marking feature like exact point till it consumed from logfile , to >> recover incase of its failure to again start reading from the same point >> of the logfile.(before failure) >> >> This is the major point where failures were difficult to ignore.Am I >> right? >> >> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <[email protected]> wrote: >> >>> Hi, >>> >>> You can use spillable channel that will store events in memory and once >>> it fills it, it will spill to the disk. >>> Also, you can use file channel, but it's as fast as your disk is and >>> it's suggested to use a separate disk for it due to high IO with it, >>> preferably an SSD. >>> >>> But, that will not solve the issue you might run into - if the flume >>> fails for whatever the reason, you'll never be able to continue from the >>> exact point where it failed. >>> Yes, File channel preserves the state, so it will continue with whatever >>> he already received, but what about the time while it was down ? >>> >>> If you cannot change anything regarding the application that produces >>> the logs, then such circumstance has to be taken as a trade off. >>> >>> >>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR < >>> [email protected]> wrote: >>> >>>> Yes I understand the concerns with this use case. >>>> >>>> If so we need to configure failover in this scenario , can we have it >>>> like channel level ,sink channel. >>>> >>>> Does flume support to configure failover incase channel fills up. >>>> >>>> >>>> >>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> In fact, this is not the problem with Flume. >>>>> >>>>> No solution will function reliably for your use case, simply because >>>>> all of them will have to do some sort of tail-f or streaming on a file and >>>>> if they can't keep up with it (they mostly don't in high speed entry >>>>> points), they will drop some entries. >>>>> Please, be kind to yourself and plan for failures - if you need to >>>>> restart Flume or any other solution then you'll face dropped entries that >>>>> you'll not be able to re-ingest easily as in most cases you won't know >>>>> which ones you've dropped. >>>>> >>>>> >>>>> Regards, >>>>> Ahmed >>>>> >>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR < >>>>> [email protected]> wrote: >>>>> >>>>>> Thanks for comments Ahmed. >>>>>> >>>>>> So from your comments , I consider that flume doesn't have any >>>>>> reliable source option for use case provided by me. >>>>>> >>>>>> If flume can't provide it, can you help me with any other log >>>>>> collector solutions which can I consider here to move real time data to >>>>>> HDFS. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Then, you're out of luck in my opinion, as there is no way other >>>>>>> than tail -f. >>>>>>> The problem with fail-f is that tail will not wait for >>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off >>>>>>> to >>>>>>> the source and then the source will just stop ingesting. >>>>>>> >>>>>>> There is a possibility to hack up the tail -f into another file and >>>>>>> then custom-rotate that duplicate file. >>>>>>> But, I wouldn't recommend such case. >>>>>>> >>>>>>> Just a side note - If you're operating Java application (Tomcat or >>>>>>> similar), then you can create multiple output files via log4j.properties >>>>>>> configuration without application itself knowing anything about it. >>>>>>> >>>>>>> Regards, >>>>>>> Ahmed >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Ahmed, >>>>>>>> >>>>>>>> Here in my case , the application will rename the existing file as >>>>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM. >>>>>>>> >>>>>>>> I can't change the log rotation policy of application for now.So I >>>>>>>> guess I should rule out the option of using spooling directory source >>>>>>>> in my >>>>>>>> case. >>>>>>>> >>>>>>>> Can you suggest me with any other options other than spooling dir >>>>>>>> source. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> It all depends on how log rotation is done and how application >>>>>>>>> producing the log file handles log rotation. >>>>>>>>> Most of the applications just reopens the log file when it >>>>>>>>> receives a kill signal. For example, nginx reopens the log file when >>>>>>>>> it >>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some >>>>>>>>> applications >>>>>>>>> might restart as a result. >>>>>>>>> >>>>>>>>> If the application just reopens the log file, then you can change >>>>>>>>> your log rotation policy to be per minute. >>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll >>>>>>>>> have to make a cron job to do it. >>>>>>>>> In such case, you would separate finished logs location and live >>>>>>>>> log location so the spooling directory source doesn't freak out about >>>>>>>>> active log file being appended. >>>>>>>>> >>>>>>>>> Anyway, spooling directory source is a way to go, as it will leave >>>>>>>>> log files in place, just renamed. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Ahmed >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here. >>>>>>>>>> >>>>>>>>>> Source:exec , tail –F command for a logfile. >>>>>>>>>> >>>>>>>>>> Channel: file channel >>>>>>>>>> >>>>>>>>>> Sink: HDFS >>>>>>>>>> >>>>>>>>>> Use case:to move real time data from logfile to HDFS. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It appears like exec is not a reliable source , as we may data >>>>>>>>>> loss if channel/source is down. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So i tried with other option "spooling directory source" which is >>>>>>>>>> mentioned as reliable source.But here I have a single logfile where >>>>>>>>>> data >>>>>>>>>> gets appended in , so I dont see option of moving the file to spool >>>>>>>>>> directory. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can anyone help me with providing any other reliable source >>>>>>>>>> option in case where logfile gets appended with data and logfile >>>>>>>>>> rotation >>>>>>>>>> happens only at the end of the day. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Saravana >>>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> --------- >>>>>>>>> This e-mail and any attachment is for authorised use by the >>>>>>>>> intended recipient(s) only. This email contains confidential >>>>>>>>> information. >>>>>>>>> It should not be copied, disclosed to, retained or used by, any party >>>>>>>>> other >>>>>>>>> than the intended recipient. Any unauthorised distribution, >>>>>>>>> dissemination >>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any >>>>>>>>> information contained in them, is strictly prohibited and may be >>>>>>>>> illegal. >>>>>>>>> If you are not an intended recipient then please promptly delete this >>>>>>>>> e-mail and any attachment and all copies and inform the sender >>>>>>>>> directly via >>>>>>>>> email. Any emails that you send to us may be monitored by systems or >>>>>>>>> persons other than the named communicant for the purposes of >>>>>>>>> ascertaining >>>>>>>>> whether the communication complies with the law and company policies. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> This e-mail and any attachment is for authorised use by the intended >>>>>>> recipient(s) only. This email contains confidential information. It >>>>>>> should >>>>>>> not be copied, disclosed to, retained or used by, any party other than >>>>>>> the >>>>>>> intended recipient. Any unauthorised distribution, dissemination or >>>>>>> copying >>>>>>> of this E-mail or its attachments, and/or any use of any information >>>>>>> contained in them, is strictly prohibited and may be illegal. If you are >>>>>>> not an intended recipient then please promptly delete this e-mail and >>>>>>> any >>>>>>> attachment and all copies and inform the sender directly via email. Any >>>>>>> emails that you send to us may be monitored by systems or persons other >>>>>>> than the named communicant for the purposes of ascertaining whether the >>>>>>> communication complies with the law and company policies. >>>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> This e-mail and any attachment is for authorised use by the intended >>>>> recipient(s) only. This email contains confidential information. It should >>>>> not be copied, disclosed to, retained or used by, any party other than the >>>>> intended recipient. Any unauthorised distribution, dissemination or >>>>> copying >>>>> of this E-mail or its attachments, and/or any use of any information >>>>> contained in them, is strictly prohibited and may be illegal. If you are >>>>> not an intended recipient then please promptly delete this e-mail and any >>>>> attachment and all copies and inform the sender directly via email. Any >>>>> emails that you send to us may be monitored by systems or persons other >>>>> than the named communicant for the purposes of ascertaining whether the >>>>> communication complies with the law and company policies. >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> Best regards, >>> Ahmed Vila | Senior software developer >>> DevLogic | Sarajevo | Bosnia and Herzegovina >>> >>> Office : +387 33 942 123 >>> Mobile: +387 62 139 348 >>> >>> Website: www.devlogic.eu >>> E-mail : [email protected] >>> --------------------------------------------------------------------- >>> This e-mail and any attachment is for authorised use by the intended >>> recipient(s) only. This email contains confidential information. It should >>> not be copied, disclosed to, retained or used by, any party other than the >>> intended recipient. Any unauthorised distribution, dissemination or copying >>> of this E-mail or its attachments, and/or any use of any information >>> contained in them, is strictly prohibited and may be illegal. If you are >>> not an intended recipient then please promptly delete this e-mail and any >>> attachment and all copies and inform the sender directly via email. Any >>> emails that you send to us may be monitored by systems or persons other >>> than the named communicant for the purposes of ascertaining whether the >>> communication complies with the law and company policies. >>> >>> --------------------------------------------------------------------- >>> This e-mail and any attachment is for authorised use by the intended >>> recipient(s) only. This email contains confidential information. It should >>> not be copied, disclosed to, retained or used by, any party other than the >>> intended recipient. Any unauthorised distribution, dissemination or copying >>> of this E-mail or its attachments, and/or any use of any information >>> contained in them, is strictly prohibited and may be illegal. If you are >>> not an intended recipient then please promptly delete this e-mail and any >>> attachment and all copies and inform the sender directly via email. Any >>> emails that you send to us may be monitored by systems or persons other >>> than the named communicant for the purposes of ascertaining whether the >>> communication complies with the law and company policies. >>> >> > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. >
