Re: future of Camus?

Adrian Woodhead Fri, 23 Oct 2015 09:29:07 -0700

Thanks everyone for your input on this thread, looks like a hot topic ;)

I thought I'd reply to everyone's feedback in one go rather than have lots of 
separate replies, so here goes...

Henry - thanks for pointing out Secor, I had never seen it before. I can see 
why not having a Hadoop dependency can be appealing but in our case we actually 
like the dependency as for Camus it means we can scale the job out on the 
cluster without having to do anything extra ourselves. The documentation also 
makes it look Secor is very S3-centric while we're interested in HDFS.

Guozhang - Copycat certainly looks very promising and again I'd never come 
across this. An HDFS export connector that runs on YARN would probably be what 
we'd be looking for and could potentially do what Camus does while being more 
tightly integrated with Kafka should mean it's less likely to be orphaned. 
We'll certainly keep an eye on this although it looks like it's probably not 
production ready yet? It also wasn't immediately clear how one would use it to 
run on YARN - our jobs are typically started on lightweight machines which have 
limited resources so we want to delegate as much as possible to the cluster 
nodes for parallelising the work with as little setup on our part as we can get 
away with.

Todd - we looked at Kaboom but we don't use Avro and need to control the 
formats of the files we create on HDFS (typically ORC and SequenceFile) along 
with also wanting full control over the HDFS paths where the files are created. 
Camus has extension points that allowed us to write our own 
RecordWriterProvider, Partitioner and MessageDecoder all of which we use and 
none of which we saw as possible in Kaboom as it currently stands. Apologies if 
we've overlooked something here.

Vivek - we also considered Flume/Flafka but we're actually trying to reduce the 
number of technologies we're using and part of the reason for us using Kafka is 
to have *one* standard mechanism for getting data in and out of Hadoop and the 
intention is for this to replace our existing Flume infrastucture. I appreciate 
that Flume can do the job but in terms of operational complexity we'd prefer to 
have fewer moving parts and we felt Camus was less complex than adding Flume to 
the end of the data pipeline.

So it sounds like Camus still has features that can't easily be replicated in 
any of the other solutions as they currently stand. It also appears that nobody 
here is keen on working on an official fork of Camus, possibly since they're 
using or working on the alternatives above? I made a similar post on the 
"Camus_etl" group 
(https://groups.google.com/forum/#!topic/camus_etl/jUkX4zC4oF0) and some 
parties there indicated that they would be interested in an official Camus fork 
or some way of keeping the current Camus codebase in existence with new 
features being added to it going forward so we'll see where that goes.

If anyone has any other opinions or thoughts please let me know. 

Thanks,

Adrian

________________________________________
From: vivek thakre <vivek.tha...@gmail.com>
Sent: 22 October 2015 23:44
To: users@kafka.apache.org
Subject: Re: future of Camus?

We are using Apache Flume as a router to consume data from Kafka and push
to HDFS.
With Flume 1.6, Kafka Channel, Source and Sink are available out of the box.

Here is the blog post from Cloudera
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

Thanks,

Vivek Thakre

On Thu, Oct 22, 2015 at 2:29 PM, Hawin Jiang <hawin.ji...@gmail.com> wrote:

> Very useful information for us.
> Thanks Guozhang.
> On Oct 22, 2015 2:02 PM, "Guozhang Wang" <wangg...@gmail.com> wrote:
>
> > Hi Adrian,
> >
> > Another alternative approach is to use Kafka's own Copycat framework for
> > data ingressing / egressing. It will be released in our 0.9.0 version
> > expected in Nov.
> >
> > Under Copycat users can write different "connector" instantiated for
> > different source / sink systems, while for your case there is a in-built
> > HDFS connector coming along with the framework itself. You can find more
> > details in these Kafka wikis / java docs:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> >
> >
> >
> https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html
> >
> > Guozhang
> >
> >
> > On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai <h...@pinterest.com.invalid>
> > wrote:
> >
> > > Take a look at secor:
> > >
> > > https://github.com/pinterest/secor
> > >
> > > Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> > > underlying systems such as Hadoop, it only uses Kafka high level
> consumer
> > > to balance the work loads.  Very easy to understand and manage.  It's
> > > probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> > > Lots of web companies use this to do the kafka data ingestion
> > > (Pinterest/Uber/AirBnb).
> > >
> > >
> > > On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead <awoodh...@hotels.com
> >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > We're looking at options for getting data from Kafka onto HDFS and
> > Camus
> > > > looks like the natural choice for this. It's also evident that
> LinkedIn
> > > who
> > > > originally created Camus are taking things in a different direction
> and
> > > are
> > > > advising people to use their Gobblin ETL framework instead. We feel
> > that
> > > > Gobblin is overkill for many simple use cases and Camus seems a much
> > > > simpler and better fit. The problem now is that with LinkedIn
> > apparently
> > > > withdrawing official support for it it appears that any changes to
> > Camus
> > > > are being managed by various forks of it and it looks like everyone
> is
> > > > building and using their own versions. Wouldn't it be better for a
> > > > community to form around one official fork so development efforts can
> > be
> > > > focused on this? Any thoughts on this?
> > > >
> > > > Thanks,
> > > >
> > > > Adrian
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: future of Camus?

Reply via email to