Hello, We are using Flume v1.4 to load JSON formatted log data into HDFS as Avro. Our flume setup looks like this:
NXLog ==> (FlumeHTTPSource -> HDFSSink w/ custom EventSerializer) Right now our custom EventSerializer (which extends AbstractAvroEventSerializer) takes the JSON input from the HTTPSource and converts it into an avro record of the appropriate type for the incoming log file. This is working great and we use the serializer to add some additional "synthetic" fields to the avro record that don't exist in the original JSON log data. My question concerns how to handle malformed JSON data (or really any error inside of the custom EventSerializer). It's very likely that as we parse the JSON there will be records where something is malformed (either the JSON itself, or a field is of the wrong type etc.). For example, a "port" field which should always be an Integer might for some reason have some ASCII text in it. I'd like to catch these errors in the EventSerializer and then write out the bad JSON to a log file somewhere that we can monitor. What is the best way to do this? Right now, all the logic for catching bad JSON would be inside of the "convert" function of the EventSerializer. Should the convert function itself throw an exception that will be gracefully handled upstream or do I just return a "null" value if there was an error? Would it be appropriate to log errors directly to a database from inside the EventSerializer convert method or would this be too slow? What are the best practices for this type of error handling? Thank you for any assistance! Best Regards, Ed
