Vadim, The advantages of Camus compared to the contrib consumer are the following (but perhaps I'm forgetting some) :
- The ability to fetch all/many topics in one job (Map Reduce can otherwise introduce a lot of overhead for small topics). - Smarter load balancing of topic partitions across tasks. - Built-in error detection and logging. - Support for speculative execution. - Automatic and complete handling of incremental imports (the contribs need a bit of hand holding). - Various configuration parameters for bucket sizes, etc. - Automatic discovery of new topics (if you use the external avro schema repo). - Automatic reporting of metrics (if you use Kafka Audit). However, Camus is currently pretty coupled with avro, and to a lesser extent with certain conventions within avro schemas, whereas the contrib is pretty much raw. Hopefully, that answers your question (?) -- Felix On Wed, Jul 3, 2013 at 4:20 AM, Vadim Keylis <vkeylis2...@gmail.com> wrote: > Jay, > What is the difference between this project and Camus? Which advantages to > use for loading log entries from kafka into Hadoop ? > > Vadim > > Sent from my iPhone > > On Jul 2, 2013, at 5:01 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > We currently have a contrib package for consuming and producing messages > > from mapreduce ( > > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e53e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD > > ). > > > > We keep running into problems (e.g. KAFKA-946) that are basically due to > > the fact that the Kafka committers don't seem to mostly be Hadoop > > developers and aren't doing a good job of maintaining this code (keeping > it > > tested, improving it, documenting it, writing tutorials, getting it moved > > over to the more modern apis, getting it working with newer Hadoop > > versions, etc). > > > > A couple of options: > > 1. We could try to get someone in the Kafka community (either a current > > committer or not) who would adopt this as their baby (it's not much > code). > > 2. We could just let Camus take over this functionality. They already > have > > a more sophisticated consumer and the producer is pretty minimal. > > > > So are there any people who would like to adopt the current Hadoop > contrib > > code? > > > > Conversely would it be possible to provide the same or similar > > functionality in Camus and just delete these? > > > > -Jay >