Paul,

I finally got a chance to test your solution below and it worked.  Thank you 
very much for the insight and information.

Regards,
dwight

From: ext Paul Chavez [mailto:pcha...@verticalsearchworks.com]
Sent: Wednesday, October 23, 2013 5:29 PM
To: user@flume.apache.org
Subject: RE: Why does a Flume source need to recognize the format of the 
message?

You want to use a 'selector' not an 'interceptor'. What your config is 
currently doing is stamping every event with a header 'category' containing the 
value 'dataset4'.

Remove the interceptor stuff and try adding these:

#add a new channel and sink for 'dropped' data
agent1.sinks = sink1 sink2
agent1.channels = file-channel-1 mem-channel-1

#configure the new channel/sink and bind them
agent1.channels.mem-channel-1.type = memory
agent1.sinks.sink2.type = null
agent1.sinks.sink2.channel = mem-channel-1

#setup the multiplexing channel selector
agent1.sources.scribe-source-ds1.channels = file-channel-1 mem-channel-1
agent1.sources.scribe-source-ds1.selector.type = multiplexing
#key on the 'category' header
agent1.sources.scribe-source-ds1.selector.header = category
#map the values of the header to channels
#dataset4 goes to the file channel
agent1.sources.scribe-source-ds1.selector.mapping.dataset4 = file-channel-1
#everything else goes to the memory channel (and eventually null sink)
agent1.sources.scribe-source-ds1.selector.default = mem-channel-1

Hope that helps,
Paul Chavez

From: dwight.marz...@here.com<mailto:dwight.marz...@here.com> 
[mailto:dwight.marz...@here.com]
Sent: Wednesday, October 23, 2013 1:43 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: RE: Why does a Flume source need to recognize the format of the 
message?

Here is my config file below.  This simply ends up dumping all the input from 
the scribe source into the files in the sink directory.  I have 5 different 
scribe categories coming into the scribe source.  In this config file I'm 
attempting to grab only the incoming data that has the scribe category called 
"dataset4".

Regards,
dwight

# Name the components on this agent
agent1.sources = scribe-source-ds1
agent1.sinks = sink1
agent1.channels = file-channel-1

##########################
# SOURCE
##########################

# Configure source 1
agent1.sources.scribe-source-ds1.type = 
org.apache.flume.source.scribe.ScribeSource
agent1.sources.scribe-source-ds1.port = 1463
agent1.sources.scribe-source-ds1.workerThreads = 5
agent1.sources.scribe-source-ds1.channels = file-channel-1

# Configure an interceptor
agent1.sources.scribe-source-ds1.interceptors = intercept4
agent1.sources.scribe-source-ds1.interceptors.intercept4.type = static
agent1.sources.scribe-source-ds1.interceptors.intercept4.key = category
agent1.sources.scribe-source-ds1.interceptors.intercept4.value = dataset4

##########################
# SINK
##########################

# Config for the file roll sink
agent1.sinks.sink1.type = file_roll
agent1.sinks.sink1.channel = file-channel-1
agent1.sinks.sink1.sink.directory = /var/log/flume

##########################
# CHANNEL
##########################

# Channel file buffer 1
agent1.channels.file-channel-1.type = file
agent1.channels.file-channel-1.checkpointDir = /mnt/flume/checkpoint1
agent1.channels.file-channel-1.dataDirs = /mnt/flume/data1

##########################
# BINDING
##########################

# Bind the source and sink to the channel
agent1.sources.scribe-source-ds1.channels = file-channel-1
agent1.sinks.sink1.channel = file-channel-1
From: ext Roshan Naik [mailto:ros...@hortonworks.com]
Sent: Wednesday, October 23, 2013 2:51 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: Why does a Flume source need to recognize the format of the 
message?

why dont you share the config you have so far. perhaps somebody here can 
comment on it.

On Wed, Oct 23, 2013 at 7:48 AM, 
<dwight.marz...@here.com<mailto:dwight.marz...@here.com>> wrote:
Ok, the place where I am stuck is trying to understand what the flume config 
file looks like to do this.  What does the config for the scribe source look 
like.  I have used the config lines for a scribe source that I found in the 
flume docs.  But I'm not seeing the scribe source split up any data.  If the 
sink is the one that splits up the data then what do the config entries look 
like for an Avro sink that would split up the scribe categories.  This is back 
to my original question of how this is done in the config file.

From: ext Roshan Naik 
[mailto:ros...@hortonworks.com<mailto:ros...@hortonworks.com>]
Sent: Tuesday, October 22, 2013 6:31 PM

To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: Why does a Flume source need to recognize the format of the 
message?

The source splits the data into individual events and inserts them into the 
channel. In a few cases the sources do additional parsing of data.

On Tue, Oct 22, 2013 at 2:33 PM, 
<dwight.marz...@here.com<mailto:dwight.marz...@here.com>> wrote:
So, from what I am gathering from the discussion below is that the Scribe 
source doesn't do the parsing or splitting of data.  It just takes in the data 
flow as is and passes it onto the sink.  The right sink splits the Scribe data 
up based on the category.    That is a good clarification for me as I saw it 
the other way around.

Having never worked with Thrift or Avro could you give me a sample entry for a 
flume config file for one of these that would parse data with a scribe category 
that is coming in via the Scribe source?

Regards,
dwight

From: ext Roshan Naik 
[mailto:ros...@hortonworks.com<mailto:ros...@hortonworks.com>]
Sent: Tuesday, October 22, 2013 5:21 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: Why does a Flume source need to recognize the format of the 
message?

i forgot to note that syslog source also does some parsing.

On Tue, Oct 22, 2013 at 1:51 PM, Roshan Naik 
<ros...@hortonworks.com<mailto:ros...@hortonworks.com>> wrote:
At a minimum it needs to know how to split incoming data into individual 
events. Typically a newline is used as the separator.

 Avro & thrift are special purpose sources/sinks which handle headers and body. 
Avro, Thrift & HTTP sources will parse the incoming data into header + body. 
AFAIKT most other sources treat the whole thing as a body. They should not need 
any more info other than line/event delimiter.

You can write custom deserializer which is supported by some sources to parse 
custom incoming data format.

-roshan


On Tue, Oct 22, 2013 at 11:07 AM, Jarek Jarcec Cecho 
<jar...@apache.org<mailto:jar...@apache.org>> wrote:
Hi Praveen,
I think that there is a confusion between message and payload. Whereas Flume do 
not need to understand the payload structure, it do need to understand the 
message to understand what events (what payloads) are there with what headers. 
To put it differently, Flume do not need to understand structure of the data 
that you are sending (payload is just a byte array for Flume), but that unknown 
structure needs to be transferred via known protocol (such as AVRO RPC).

Jarcec

On Tue, Oct 22, 2013 at 06:59:17PM +0100, Praveen Sripati wrote:
> According to the Flume documentation
>
> >>    A Flume source consumes events delivered to it by an external source
> like a web server. The external source sends events to Flume in a format
> that is recognized by the target Flume source. For example, an Avro Flume
> source can be used to receive Avro events from Avro clients or other Flume
> agents in the flow that send events from an Avro sink.
>
> Why does a Flume source need to recognize or understand the format of the
> message? While all it does it does is to forward the message to one of the
> channel.
>
> Thanks,
> Praveen



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Reply via email to