Hello, I'm trying to setup a flume pipeline where there is tolerance for long periods of HDFS down time. Here's how my current configuration looks like (notice that agent1 here receives data via netcat interface, however, in my prod setup it receives data from an external avro):
################## FLUME.CONF BEGIN ################## ####### AGENT 1 : durable_channel_1 (will always be up) ######## durable_channel_1.sources = click-source durable_channel_1.sinks = forward-sink durable_channel_1.channels = file-channel ### Source Definitions durable_channel_1.sources.click-source.type = netcat durable_channel_1.sources.click-source.bind = agent1 durable_channel_1.sources.click-source.port = 44445 durable_channel_1.sources.click-source.channels = file-channel durable_channel_1.sources.click-source.max-line-length = 2000 ### Channel Definitions durable_channel_1.channels.file-channel.type = file durable_channel_1.channels.file-channel.capacity = 6000000 durable_channel_1.channels.file-channel.transactionCapacity = 10000 ### Sink Definitions durable_channel_1.sinks.forward-sink.channel = file-channel durable_channel_1.sinks.forward-sink.type = avro durable_channel_1.sinks.forward-sink.hostname = vm-cluster-node3 durable_channel_1.sinks.forward-sink.port = 57938 ####### AGENT 2 : stream_persist (will be brought down for long periods and then back online) ######## stream_persist.sources = durable-collection-source stream_persist.sinks = hdfs-sink stream_persist.channels = mem-channel ### Source Definitions stream_persist.sources.durable-collection-source.type = avro stream_persist.sources.durable-collection-source.bind = vm-cluster-node3 stream_persist.sources.durable-collection-source.port = 57938 stream_persist.sources.durable-collection-source.channels = mem-channel ### Channel Definitions stream_persist.channels.mem-channel.type = memory stream_persist.channels.mem-channel.capacity = 6000000 stream_persist.channels.mem-channel.transactionCapacity = 10000 ### Sink Definitions stream_persist.sinks.hdfs-sink.channel = mem-channel stream_persist.sinks.hdfs-sink.type = com.mycompany.flume.MyCustomHDFSSink stream_persist.sinks.hdfs-sink.format = hdfs ################## FLUME.CONF END ################## hdfs-sink is a custom hdfs sink that I custom built, it persists to HDFS after doing some realtime processing. In the configuration above, shouldn't Agent1 resend all the backlog accumulated while Agent 2 was down? In my case it seems that it only persists them on disk but it does not resend the data when it reconnects with agent 1. Also a separate but related question, in my custom hdfs-sink, is throwing an exception sufficient to indicate that the event was not processed successfully? I would like to to propagate back to agent1 that agent2 has failed in persisting to HDFS so that it resends the data (for example, if an HDFS write has failed, agent1 should resend that event). Currently I'm only throwing an exception but it seems that this is not triggering a retry. Thank you
