Hello All,
I am stuck with a problem with flume version 1.4.0. I am using
spooldirectory source with a custom interceptor to process encoded gps
files and save it in hdfs and solr (using morphline solr sink). The main
informtion is stored on the file name itself which is coming in on the
spool directory and the content is irrelevant. So I am using the custom
interceptor to extract and transform the file header and store the
extracted data in Json format as the output of the event.
My problem comes in:
1. When there is a 0 byte file comes in (generally files come in with a
"!" symbol in the content) flume stops and throws an exception. We don't
need the content of the file in any case, but still face exception as
flume cannot handle 0 byte files.
2. When there is content with some weird characters like !f!, flume
stops with exception
3. Even when everything is running fine, I am losing some data/ events.
On closer introspection I found that some are available in hdfs but not
in solr and vice versa. I am not using any processor sinkgroups like
failover or load balancing. Is it because of that?
I want to achieve a solution where I can handle any exceptions and the
file/data which causes the exception is discarded and flume processes
the next file in the spool directory. The date comes in at high velocity
100 files every seconds. So manually deleting the file and retstarting
flume is the regular practice I do to keep everything back on track. But
I am sure there must be some better ways to handle this case. Can you
guys please suggests some better alternatives for my approach please//?/
Thanks & Regards,
Souvik Bose
///