Thanks Matt, that seems like an approach we could also take, good to have a 
clearer view of where the limitations of the current system are.


________________________________
From: Kenison, Matt [[email protected]]
Sent: 06 February 2014 21:11
To: [email protected]
Subject: Re: Channel management: messages that will never be delivered

It's possible, but not easy to do. In our application, the individual custom 
sinks know which exceptions can rollback or are unrecoverable, but this doesn't 
work for the built-in sinks, and it doesn't take into account unexpected 
failures. So, we control it manually with a JMX flag and by subclassing 
BasicTransactionSemantics. When the flag is set, and the transaction tries to 
rollback, it performs a commit instead (and directs the messages to a failure 
channel). When a transaction is successful, the flag is reset.

It's not the prettiest solution, but isn't a hack. It requires subclassing the 
channel to provide a custom transaction, and overriding the default transaction 
behavior.  Flume really doesn't make it easy to extend behavior of any of the 
standard components.


From: Paul Merry <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, February 6, 2014 12:28 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: RE: Channel management: messages that will never be delivered


Thanks for the suggestion Ed, it's definitely something we could look at.

I did find this ticket https://issues.apache.org/jira/browse/FLUME-2140 and the 
linked discussion thread 
http://flume.markmail.org/thread/y3cks6hdgof3kxu6#query:+page:1+mid:rx3zm53t4dhmqskk+state:results

There are some suggestions for work arounds there, probably the use of 
failiover sink is most relevant but I'd be concerned for what might happen to 
legitimate messages in a situation where there is downtime or connection issues 
with the endpoint. It seems we'd loose the correct channel retry logic (in that 
scenario) and end up with messages that would need replaying.

If there isn't much to add on the handling of 'bad messages' can anyone inform 
on the handling of other messages in a batch with one or more of these messages 
that will never deliver. Will they also not make it to their destination or 
will they get rebatched?

Also keen for anyone with an idea for how to clear these messages from a 
channel once they are stuck, as the directory deletion can take good messages 
down too.


- Paul


________________________________
From: ed [[email protected]<mailto:[email protected]>]
Sent: 05 February 2014 23:12
To: [email protected]<mailto:[email protected]>
Subject: Re: Channel management: messages that will never be delivered

Hi Paul,

Not sure if this would work for you but if you can error check prior to the 
events reaching Elasticsearch you can handle this by writing a custom 
Interceptor that validates your events.  You can do more robust error checking 
here than you can by just relying on the already existing event header fields 
as you'll have full access to the event header and body.  Within the 
interceptor, if the event is not compatible with Elasticsearch you can add a 
boolean flag to the header of the event like "hasError".  Then you can route 
any events that have an error to a different "error" channel using a 
multiplexing selector by checking for the hasError flag.  The error channel can 
either be connected to a NullSink or a FileRoll sink if you want to preserve 
the improperly formatted events.

We've only used the memory channel so far so I'm afraid I can't comment on the 
file channel specific questions you have.  Hopefully someone on the list with 
some more experience there can chime in.

Best,

Ed


On Wed, Feb 5, 2014 at 5:34 PM, Paul Merry 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

We are using an Elasticsearch sink and have seen a file channel filling with 
messages that will never be delivered as the format of the message is 
incompatible with Elasticsearch itself.

Example message from Flume logs:


24 Jan 2014 08:14:55,173 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event.
Exception follows.
org.elasticsearch.indices.InvalidIndexNameException: [UpperCase-2014-01-23]
Invalid index name [UpperCase-2014-01-23], must be lowercase

In this case the index name comes from a header so we have a workaround using a 
multiplexing channel selector to detect and re-route messages based on headers 
of this format.

To clean up the channel this time we removed the data and checkpoint 
directories, which is not ideal as we probably lost other messages in doing 
this.

We are wary of similar situations occurring in future for messages that we 
can't detect and divert in advance and so have a few questions:

- What would be the recommended handling of this situation?

- Is it possible to clear just these messages from the channel or does the 
whole channel have to be deleted ?

- Is there a way that we can divert these messages to another channel (dead 
letter / invalid message style) ? Noting that they are not known to be 
problematic until after an attempt is made to deliver them from the sink

- What happens to other messages in a batch with a bad message ? Will they also 
be stuck forever or will they be taken in another batch ?


Thanks,

Paul.



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

Reply via email to