Flume HDFS Sink: Dynamic Path format for IP Address

Traiano Welcome Sat, 01 Nov 2014 12:23:31 -0700

Hi List

 I've configured flume to accept remote syslogs from rsyslog on a number of
hosts, and am currently using an HDFS sink, specified as follows in the
agent config:


#sink for syslog udp collection
tier1.sinks.sink2.type         = hdfs
tier1.sinks.sink2.channel      = channel2
tier1.sinks.sink2.hdfs.path         =
hdfs:///tmp/remote-syslogs/%y-%m-%d/%H%M/%S
tier1.sinks.sink2.hdfs.fileType     = DataStream
tier1.sinks.sink2.hdfs.writeFormat  = Text
tier1.sinks.sink2.hdfs.rollSize     = 0
tier1.sinks.sink2.hdfs.rollCount    = 10000
tier1.sinks.sink2.hdfs.rollInterval = 600

Syslog entries are being collected and written to hdfs with the directory
structure generated as specified above, i.e  ...
tmp/remote-syslogs/%y-%m-%d/%H%M/%S however, I'd like to have the dynamic
path
generated include the ip address of the remote host sending the syslog to
the source, as in something like:

tier1.sinks.sink2.hdfs.path         =
hdfs:///tmp/remote-syslogs/%HOSTNAME/%y-%m-%d/%H%M/%S

If a parameter like %HOSTNAME is at all possible.

My question: Is there a selector or other parameter supported by flume that
I could use for this?

I've looked in the user guide's section on hdfs sink specification, but it
does not seem to address other possibilities for dynamic path format, the
only similar feature I can see is the"interceptor", of which the "host
interceptor seems similar to what I have in mind:

http://flume.apache.org/FlumeUserGuide.html#host-interceptor

... however, that seems to apply to the source's agent only.

Could there be an interceptor configuration that would extract the sending
address or hostname from an incoming syslog packet to the agent source?

Many thanks in advance,
Traiano

Flume HDFS Sink: Dynamic Path format for IP Address

Reply via email to