Von: Gonzalo Herreros <[email protected]>
An: [email protected],
Datum: 08.09.2015 09:29
Betreff: Re: How to customize the key in a HDFS SequenceFile sink
Thanks for your prompt reply. May I ask you to give me some more details.
I'm a little confused as I've read that the "hdfs.serializer" parameter is
ignored when using sequence files.
Does it mean that my custom serializer is responsible for writing
"correct" SequenceFiles (e.g. using "createWriter" of
org.apache.hadoop.io.SequenceFile)?
I assume that I have to do the following (see pseudocode below):
1)
agent configuration:
hdfs.fileType = DataStream
hdfs.serializer = MyBuilder
2)
public class MySerializer implements EventSerializer {
customize the key and writing to the outputStream using the createWriter
method
}
3)
public static class MyBuilder implements EventSerializer.Builder {
return new MySerializer(context, os)
}
Thanks a lot for your support.
I would implement a custom serializer and configure it in the standard
Hdfs sink.
That way you control how you build the key for each event.
Regards,
Gonzalo
On 8 September 2015 at 06:42, <[email protected]>
wrote:
Hello,
I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm
looking for a possibility to create "custom keys". Per default, Flume is
using the Timestamp as key within a SequenceFile. However, in my usecase I
would like to use a customized string as key (instead of the timestamp).
What are best practices for implementing/configuring such a "custom key"
within Flume?
Best, Thomas