Hi there, we are currently trying to use flume, to stream log files from a Host to the HDFS of a cloudera cluster.
Since we want to have a reliable system, we've setup flume with a failover grouping, so that if one of the sinks is failing the other one will take over. Each sink is configured to connect to a separate namenode. This is how our config looks like: # Defining a sinkgroup for failover agent.sinkgroups = groupOne agent.sinkgroups.groupOne.sinks = hdfsSink1 hdfsSink2 agent.sinkgroups.groupOne.processor.type = failover agent.sinkgroups.groupOne.processor.priority.hdfsSink1 = 10 agent.sinkgroups.groupOne.processor.priority.hdfsSink2 = 5 agent.sources = tailSrc agent.channels = memoryChannel agent.sinks = hdfsSink1 hdfsSink2 # For each one of the sources, the type is defined agent.sources.tailSrc.type = exec agent.sources.tailSrc.command = tail -F /var/log/events.log agent.sources.tailSrc.channels = memoryChannel # Definition of first sink agent.sinks.hdfsSink1.type = hdfs agent.sinks.hdfsSink1.hdfs.useLocalTimeStamp = True agent.sinks.hdfsSink1.hdfs.path = hdfs://host1.com:8020/events/%y-%m-%d/%H agent.sinks.hdfsSink1.hdfs.filePrefix = %M-events #Specify the channel the sink should use agent.sinks.hdfsSink1.channel = memoryChannel # Each sink's type must be defined agent.sinks.hdfsSink2.type = hdfs agent.sinks.hdfsSink2.hdfs.useLocalTimeStamp = True agent.sinks.hdfsSink2.hdfs.path = hdfs://host2.com:8020/events/%y-%m-%d/%H agent.sinks.hdfsSink2.hdfs.filePrefix = %M-events #Specify the channel the sink should use agent.sinks.hdfsSink2.channel = memoryChannel # Each channel's type is defined. agent.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent.channels.memoryChannel.capacity = 1000 In the first run we used Flume 1.4.0 and tested the switch to the backup sink by doing a manual failover of the namenode. This didn't work at all, flume got stuck in exceptions. A bit of research pointed out that there is known bug, (https://issues.apache.org/jira/browse/FLUME-1779) And we patched Flume 1.5.0 on our own. At least the failover is working now. Nevertheless flume keeps on throwing exceptions for the sink which has been disconnected. Has anyone any idea how to tackle this issue? Failed to renew lease for [DFSClient_NONMAPREDUCE_-1223354028_45] for 3598 seconds. Will retry shortly ... org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby Best Regards, Malte
