Ahmed,

Thanks for your details comments.

Final point, in which cases these logging solution will be considered as a
perfect system without  any tradeoffs,

On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <[email protected]> wrote:

> Exactly up to the point.
>
>
>
>
> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <[email protected]>
> wrote:
>
>> That was a good point.
>>
>> So if a solution mention as guarantee data delivery , it specifies that
>>  only in the case when the event flows into the source/producers
>> successfully by application and then from that point the system guarantee
>> the event delivery till other end sink/consumer.
>>
>> It has no control over the proper flow of event reaching the
>> source/producer.(like data loss)
>>
>> So there always be chances of data loss when the system goes down , where
>> certain tradeoff measures to be taken.
>>
>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Flume, Kafka, or any other system can only be responsible for it's own
>>> actions. Looking from the perspective of the exec source in Flume - it
>>> requests from the bash to give him an output from his stout. It cannot
>>> control what bash will return.
>>> Thus, it's not a file to him, but just a stream of text.
>>>
>>> When spooling directory source is in question, it will resume from the
>>> file it failed with.
>>> That reveals two approaches to event consumption: push and pull.
>>>
>>> When push approach is used then it cannot be aware of what comes next
>>> and what was before it started to listen.
>>>
>>> Even so, some sources/producers, even they use pull approach, doesn't
>>> have to know how to return to the last read event. It's up to
>>> implementation.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>> [email protected]> wrote:
>>>
>>>> yes , I agree .
>>>>
>>>> I think no logging solution like source in flume/producer in kafka
>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>> to recover  incase of its failure to again start reading from the same
>>>> point of the logfile.(before failure)
>>>>
>>>> This is the major point where failures were difficult to ignore.Am I
>>>> right?
>>>>
>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> You can use spillable channel that will store events in memory and
>>>>> once it fills it, it will spill to the disk.
>>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>>> preferably an SSD.
>>>>>
>>>>> But, that will not solve the issue you might run into - if the flume
>>>>> fails for whatever the reason, you'll never be able to continue from the
>>>>> exact point where it failed.
>>>>> Yes, File channel preserves the state, so it will continue with
>>>>> whatever he already received, but what about the time while it was down ?
>>>>>
>>>>> If you cannot change anything regarding the application that produces
>>>>> the logs, then such circumstance has to be taken as a trade off.
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Yes I understand the concerns with this use case.
>>>>>>
>>>>>> If so we need to configure failover in this scenario , can we have it
>>>>>> like channel level ,sink channel.
>>>>>>
>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>
>>>>>>> No solution will function reliably for your use case, simply because
>>>>>>> all of them will have to do some sort of tail-f or streaming on a file 
>>>>>>> and
>>>>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>>>>> points), they will drop some entries.
>>>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>>>> restart Flume or any other solution then you'll face dropped entries 
>>>>>>> that
>>>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>> which ones you've dropped.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>
>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>
>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>> HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>>>> than tail -f.
>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will 
>>>>>>>>> back-off to
>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>
>>>>>>>>> There is a possibility to hack up the tail -f into another file
>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>
>>>>>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>>>>>> similar), then you can create multiple output files via 
>>>>>>>>> log4j.properties
>>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Ahmed,
>>>>>>>>>>
>>>>>>>>>> Here in my case , the application will rename the existing file
>>>>>>>>>> as <logfile>.yesterdaydate and create a new file as <logfile> at 
>>>>>>>>>> 00:00 AM.
>>>>>>>>>>
>>>>>>>>>> I can't change the log rotation policy of application for now.So
>>>>>>>>>> I guess I should rule out the option of using spooling directory 
>>>>>>>>>> source in
>>>>>>>>>> my case.
>>>>>>>>>>
>>>>>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>>>>>> source.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file 
>>>>>>>>>>> when it
>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some 
>>>>>>>>>>> applications
>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>
>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>>>>>> have to make a cron job to do it.
>>>>>>>>>>> In such case, you would separate finished logs location and live
>>>>>>>>>>> log location so the spooling directory source doesn't freak out 
>>>>>>>>>>> about
>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ahmed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>
>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>
>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>
>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>
>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So i tried with other option "spooling directory source" which
>>>>>>>>>>>> is mentioned as reliable source.But here I have a single logfile 
>>>>>>>>>>>> where data
>>>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>> option in case where logfile gets appended with data and logfile 
>>>>>>>>>>>> rotation
>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Saravana
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> ---------
>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>> intended recipient(s) only. This email contains confidential 
>>>>>>>>>>> information.
>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any 
>>>>>>>>>>> party other
>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, 
>>>>>>>>>>> dissemination
>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>> information contained in them, is strictly prohibited and may be 
>>>>>>>>>>> illegal.
>>>>>>>>>>> If you are not an intended recipient then please promptly delete 
>>>>>>>>>>> this
>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender 
>>>>>>>>>>> directly via
>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>> persons other than the named communicant for the purposes of 
>>>>>>>>>>> ascertaining
>>>>>>>>>>> whether the communication complies with the law and company 
>>>>>>>>>>> policies.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential 
>>>>>>>>> information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party 
>>>>>>>>> other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, 
>>>>>>>>> dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be 
>>>>>>>>> illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender 
>>>>>>>>> directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of 
>>>>>>>>> ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It 
>>>>>>> should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than 
>>>>>>> the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or 
>>>>>>> copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and 
>>>>>>> any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Ahmed Vila | Senior software developer
>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>
>>>>> Office : +387 33 942 123
>>>>> Mobile: +387 62 139 348
>>>>>
>>>>> Website: www.devlogic.eu
>>>>> E-mail   : [email protected]
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or 
>>>>> copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or 
>>>>> copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Reply via email to