Hi Ian, Yes, events are getting written to but the regex_extractor variable is not getting substituted in the HDFS path
I’ve tried both hostname with the regex you advised yet, No luck Is regex_extrator for the HDFS path of Sink even supported ? 18 Feb 2016 00:58:40,855 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:265) - Creating /prod/hadoop/smallsite/flume_ingest_ale2//2016/02/18/00/Sutanu_regex_ALE_2_Station_topic.1455757120803.tmp From: iain wright [mailto:[email protected]] Sent: Wednesday, February 17, 2016 7:39 PM To: [email protected] Subject: Re: regex_extractor NOT replacing the HDFS path vaiable Config looks sane, Are events being written to /prod/hadoop/smallsite/flume_ingest_ale2//%Y/%m/%d/%H? A couple things that may be worth trying if you haven't yet: - Try host=(ale-\d+-\w+.attwifi.com<http://attwifi.com>) instead of .*host=(ale-\d+-\w+.attwifi.com<http://attwifi.com/>).* - Try hostname or another header instead of host, since host is a header used by the host interceptor -- Iain Wright This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message. On Wed, Feb 17, 2016 at 5:06 PM, Sutanu Das <[email protected]<mailto:[email protected]>> wrote: Hi Hari/Community, We are trying to replace the hdfs path with the regex_extrator interceptor but apparently the variable is not getting replaced in the HDFS path in the HDFS Sink. We are trying to replace the HDFS path of the HDFS Sink with /prod/hadoop/smallsite/flume_ingest_ale2/%{host}/%Y/%m/%d/%H….. Where /%{host} is the regex = .*host=(ale-\d+-\w+.attwifi.com<http://attwifi.com>).* of type = regex_extractor We know the regex works b/c we checked in python that the source data output has the regex match >>> pattern = >>> re.compile("host=(\w+-\d+-\w+.attwifi.com<http://attwifi.com>)\s.*") >>> pattern.match(s) <_sre.SRE_Match object at 0x7f8ca5cb4f30> >>> s 'host=ale-1-sa.attwifi.com<http://ale-1-sa.attwifi.com> seq=478237182 timestamp=1455754889 op=1 topic_seq=540549 lic_info=10 topic=station sta_eth_mac=60:f8:1d:95:74:79 username=Javiers-phone role=centerwifi bssid=40:e3:d6:b0:02:52 device_type=iPhone sta_ip_address=192.168.21.14 hashed_sta_eth_mac=928ebc57036a2df7909c70ea5fce35774687835f hashed_sta_ip_address=8c76d83c5afb6aa1ca814d8902943a42a58d0a23 vlan=0 ht=0 ap_name=BoA-AP564' >>> Is my config incorrect or do we need to write a custom interceptor on this? Here is my Flume config: multi-ale2-station.sources = source1 multi-ale2-station.channels = channel1 multi-ale2-station.sinks = sink1 # Define the sources multi-ale2-station.sources.source1.type = exec multi-ale2-station.sources.source1.command = /usr/local/bin/multi_ale2.py -f /etc/flume/ale_station_conf/m_s.cfg multi-ale2-station.sources.source1.channels = channel1 # Define the channels multi-ale2-station.channels.channel1.type = memory multi-ale2-station.channels.channel1.capacity = 10000000 multi-ale2-station.channels.channel1.transactionCapacity = 10000000 # Define the interceptors multi-ale2-station.sources.source1.interceptors = i1 multi-ale2-station.sources.source1.interceptors.i1.type = regex_extractor multi-ale2-station.sources.source1.interceptors.i1.regex = .*host=(ale-\d+-\w+.attwifi.com<http://attwifi.com>).* multi-ale2-station.sources.source1.interceptors.i1.serializers = s1 multi-ale2-station.sources.source1.interceptors.i1.serializers.type = default multi-ale2-station.sources.source1.interceptors.i1.serializers.s1.name<http://multi-ale2-station.sources.source1.interceptors.i1.serializers.s1.name> = host # Define a logging sink multi-ale2-station.sinks.sink1.type = hdfs multi-ale2-station.sinks.sink1.channel = channel1 multi-ale2-station.sinks.sink1.hdfs.path = /prod/hadoop/smallsite/flume_ingest_ale2/%{host}/%Y/%m/%d/%H multi-ale2-station.sinks.sink1.hdfs.fileType = DataStream multi-ale2-station.sinks.sink1.hdfs.writeFormat = Text multi-ale2-station.sinks.sink1.hdfs.filePrefix = Sutanu_regex_ALE_2_Station_topic multi-ale2-station.sinks.sink1.hdfs.useLocalTimeStamp = true
