Hi Hari,
Thanks you for replying on my question. You are absolutely right, I am
using only one channel for both the sinks which is causing the problem.
Thanks for pointing that out, One problem is solved.
For spooldirectory, I am processing the files directly using my own
custom interceptor. Here is the config for the source:
dnAgent.sources.gpslog.type = spooldir
dnAgent.sources.gpslog.spoolDir = /home/ktspool
dnAgent.sources.gpslog.batchSize = 500
dnAgent.sources.gpslog.channels = MemChannel
dnAgent.sources.gpslog.fileHeader = true
dnAgent.sources.gpslog.deletePolicy = immediate
dnAgent.sources.gpslog.useStrictSpooledFilePolicies = false
dnAgent.sources.gpslog.interceptors = KTFlowProcessInterceptor
dnAgent.sources.gpslog.interceptors.KTFlowProcessInterceptor.type=com.souvikbose.flume.interceptors.KTFlowProcessInterceptor$Builder
Generally this works great if everything is okay. But the problem is the
gps provider doesn't have full control on what comes in so sometimes
blank file with 0 bytes size comes in which causes flume to stop
processing with exception and I have to manually restart the flume.
P.S: I am using flume 1.4.0 on cdh 4.4.0 on 4 data nodes in EC2.
Thanks & Regards,
Souvik
On 12/8/2014 11:36 PM, Hari Shreedharan wrote:
You are likely reading from the same channel for both sinks. That
means only one sink gets your data. You’d need to have 2 channels
connected to the same source and each sink get its own channel.
About the Spool Dir not processing data, what format/serializer etc
are you using?
Thanks,
Hari
On Mon, Dec 8, 2014 at 3:37 AM, Souvik Bose <[email protected]
<mailto:[email protected]>> wrote:
Hello All,
I am stuck with a problem with flume version 1.4.0. I am using
spooldirectory source with a custom interceptor to process encoded
gps files and save it in hdfs and solr (using morphline solr
sink). The main informtion is stored on the file name itself which
is coming in on the spool directory and the content is irrelevant.
So I am using the custom interceptor to extract and transform the
file header and store the extracted data in Json format as the
output of the event.
My problem comes in:
1. When there is a 0 byte file comes in (generally files come in
with a "!" symbol in the content) flume stops and throws an
exception. We don't need the content of the file in any case, but
still face exception as flume cannot handle 0 byte files.
2. When there is content with some weird characters like !ƒ!,
flume stops with exception
3. Even when everything is running fine, I am losing some data/
events. On closer introspection I found that some are available in
hdfs but not in solr and vice versa. I am not using any processor
sinkgroups like failover or load balancing. Is it because of that?
I want to achieve a solution where I can handle any exceptions and
the file/data which causes the exception is discarded and flume
processes the next file in the spool directory. The date comes in
at high velocity 100 files every seconds. So manually deleting the
file and retstarting flume is the regular practice I do to keep
everything back on track. But I am sure there must be some better
ways to handle this case. Can you guys please suggests some better
alternatives for my approach please//?/
Thanks & Regards,
Souvik Bose
///
--
Met vriendelijke groeten / Mit freundlichen Grüßen / With kind regards,
Delgence | Delivering Intelligence
Delivering high quality IT solutions.
*Souvik Bose*
CIO
Development Office:
Rishi Tech Park Office No. E -3, Premises No. 02-360 Street No. 360 New
Town Rajarhat
Kolkata-700156. India
Europe Office:
Liessentstraat 9a, 5405 AH Uden
The Netherlands
*T*+91 9831607354 | T +31 616392268 | *
E* [email protected] <mailto:[email protected]> | *W*
www.delgence.com <http://www.delgence.com>
/This communication and any attachments hereto may contain confidential
information. Unauthorized use//
//or disclosure to additional parties is prohibited. If you are not an
intended recipient, kindly notify the sender//
//and destroy all copies in your possession/