Hi List I've configured flume to accept remote syslogs from rsyslog on a number of hosts, and am currently using an HDFS sink, specified as follows in the agent config:
#sink for syslog udp collection tier1.sinks.sink2.type = hdfs tier1.sinks.sink2.channel = channel2 tier1.sinks.sink2.hdfs.path = hdfs:///tmp/remote-syslogs/%y-%m-%d/%H%M/%S tier1.sinks.sink2.hdfs.fileType = DataStream tier1.sinks.sink2.hdfs.writeFormat = Text tier1.sinks.sink2.hdfs.rollSize = 0 tier1.sinks.sink2.hdfs.rollCount = 10000 tier1.sinks.sink2.hdfs.rollInterval = 600 Syslog entries are being collected and written to hdfs with the directory structure generated as specified above, i.e ... tmp/remote-syslogs/%y-%m-%d/%H%M/%S however, I'd like to have the dynamic path generated include the ip address of the remote host sending the syslog to the source, as in something like: tier1.sinks.sink2.hdfs.path = hdfs:///tmp/remote-syslogs/%HOSTNAME/%y-%m-%d/%H%M/%S If a parameter like %HOSTNAME is at all possible. My question: Is there a selector or other parameter supported by flume that I could use for this? I've looked in the user guide's section on hdfs sink specification, but it does not seem to address other possibilities for dynamic path format, the only similar feature I can see is the"interceptor", of which the "host interceptor seems similar to what I have in mind: http://flume.apache.org/FlumeUserGuide.html#host-interceptor ... however, that seems to apply to the source's agent only. Could there be an interceptor configuration that would extract the sending address or hostname from an incoming syslog packet to the agent source? Many thanks in advance, Traiano
