Hello All,
I am stuck with a problem with flume version 1.4.0. I am using spooldirectory source with a custom interceptor to process encoded gps files and save it in hdfs and solr (using morphline solr sink). The main informtion is stored on the file name itself which is coming in on the spool directory and the content is irrelevant. So I am using the custom interceptor to extract and transform the file header and store the extracted data in Json format as the output of the event.
My problem comes in:

1. When there is a 0 byte file comes in (generally files come in with a "!" symbol in the content) flume stops and throws an exception. We don't need the content of the file in any case, but still face exception as flume cannot handle 0 byte files. 2. When there is content with some weird characters like !f!, flume stops with exception 3. Even when everything is running fine, I am losing some data/ events. On closer introspection I found that some are available in hdfs but not in solr and vice versa. I am not using any processor sinkgroups like failover or load balancing. Is it because of that?

I want to achieve a solution where I can handle any exceptions and the file/data which causes the exception is discarded and flume processes the next file in the spool directory. The date comes in at high velocity 100 files every seconds. So manually deleting the file and retstarting flume is the regular practice I do to keep everything back on track. But I am sure there must be some better ways to handle this case. Can you guys please suggests some better alternatives for my approach please//?/

Thanks & Regards,
Souvik Bose
///

Reply via email to