Hi all,
I'm looking for some guidance , I have been trying to get a flow working that
involves the following:
Source Avro --> mem channel --> file_roll
File Roll config
agent.sinks.persistence-sink.type = file_roll
agent.sinks.persistence-sink.sink.directory = /home/flume/persistence
agent.sinks.persistence-sink.sink.serializer = avro_event
agent.sinks.persistence-sink.batchSize = 1000
agent.sinks.persistence-sink.sink.rollInterval = 300
Once the data is on local disk, I want to flume the data to another flume server
Source spooldir --> mem channel -- Avro Sink (to another flume server)
agent.sources.persistence-dev-source.type = spooldir
agent.sources.persistence-dev-source.spoolDir = /home/flume/ready
agent.sources.persistence-dev-source.deserializer = avro
agent.sources.persistence-dev-source.deserializer.schemaType = LITERAL
agent.sources.persistence-dev-source.batchSize = 1000
The problem is that file_roll will put the incoming Avro data into a Avro
container before storing the data on the local file system. Then when the data
is picked up by the spooldir source , and sent to the flume server, it will
have the file_roll headers when being read by the interceptor.
Is there a recommended way to save the Avro data coming in, that will maintain
its integrity when sending on to another flume server, which is waiting on Avro
data to multiplex and send to its channels.
I have tried many different variations, with the result of the above
configurations getting the Avro to the other server with the Avro data that was
received, but the problem is that the applications will see the container
headers from the file_roll , and not the headers from the records from the
initial Avro data.
Thanks,
Jim
schema that gets set by file_roll on its writes to disk:
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "headers",
"type" : {
"type" : "map",
"values" : "string"
}
}, {
"name" : "body",
"type" : "bytes"
} ]
}