Hi, I'm relatively new to Camel and while learning how to use it created a DataFormat extension using Apache Tika. With this you can unmarshal file formats supported by Apache Tika into Camel messages and filter on the document's contenttype etc. The unmarshalled text will be in the body of the message and any properties will be set in the header of the message.
Example: from("something").unmarshal(tika).choice().when( header("tikacontenttype").isEqualTo("application/zip")).to( "somewhere") .when(header("tikacontenttype").startsWith("text/plain")).to( "someshereelse").otherwise().to("nowhere"); The code although not finished can already be useful for some is here: https://github.com/wheijke/camel-tika Another project is Twitter4j support that I hope to finish soon. Enjoy, Wouter