Re: Kafka/Hadoop consumers and producers

Andrew Otto Fri, 09 Aug 2013 11:50:13 -0700

For the last 6 months, we've been using this:

https://github.com/wikimedia-incubator/kafka-hadoop-consumer


In combination with this wrapper script:
https://github.com/wikimedia/kraken/blob/master/bin/kafka-hadoop-consume

It's not great, but it works!



On Aug 9, 2013, at 2:06 PM, Felix GV <[email protected]> wrote:

> I think the answer is that there is currently no strong community-backed
> solution to consume non-Avro data from Kafka to HDFS.
> 
> A lot of people do it, but I think most people adapted and expanded the
> contrib code to fit their needs.
> 
> --
> Felix
> 
> 
> On Fri, Aug 9, 2013 at 1:27 PM, Oleg Ruchovets <[email protected]> wrote:
> 
>> Yes , I am definitely interested with such capabilities. We also using
>> kafka 0.7.
>>   Guys I already asked , but nobody answer: what community using to
>> consume from kafka to hdfs?
>> My assumption was that if Camus support only Avro it will not be suitable
>> for all , but people transfer from kafka to hadoop somehow. So the question
>> is what is the alternatives to Camus to transfer messages from kafka to
>> hdfs?
>> Thanks
>> Oleg.
>> 
>> 
>> On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <[email protected]
>>> wrote:
>> 
>>> Felix,
>>> The Camus route is the direction I have headed for allot of the reasons
>>> that you described. The only wrinkle is we are still on Kafka 0.7.3 so I
>> am
>>> in the process of back porting this patch:
>>> 
>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8that
>>> is described here:
>>> https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that
>>> we can handle reading and writing non-avro'ized (if that is a word) data.
>>> 
>>> I hope to have that done sometime in the morning and would be happy to
>>> share it if others can benefit from it.
>>> 
>>> Thanks,
>>> Andrew
>>> 
>>> 
>>> On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>>> 
>>>> The contrib code is simple and probably wouldn't require too much work
>> to
>>>> fix, but it's a lot less robust than Camus, so you would ideally need
>> to do
>>>> some work to make it solid against all edge cases, failure scenarios and
>>>> performance bottlenecks...
>>>> 
>>>> I would definitely recommend investing in Camus instead, since it
>> already
>>>> covers a lot of the challenges I'm mentioning above, and also has more
>>>> community support behind it at the moment (as far as I can tell,
>> anyway),
>>>> so it is more likely to keep getting improvements than the contrib code.
>>>> 
>>>> --
>>>> Felix
>>>> 
>>>> 
>>>> On Thu, Aug 8, 2013 at 9:28 AM, <[email protected]> wrote:
>>>> 
>>>>> We also have a need today to ETL from Kafka into Hadoop and we do not
>>>>> currently nor have any plans to use Avro.
>>>>> 
>>>>> So is the official direction based on this discussion to ditch the
>> Kafka
>>>>> contrib code and direct people to use Camus without Avro as Ken
>> described
>>>>> or are both solutions going to survive?
>>>>> 
>>>>> I can put time into the contrib code and/or work on documenting the
>>>>> tutorial on how to make Camus work without Avro.
>>>>> 
>>>>> Which is the preferred route, for the long term?
>>>>> 
>>>>> Thanks,
>>>>> Andrew
>>>>> 
>>>>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>>>>>> Hi Andrew,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Camus can be made to work without avro. You will need to implement a
>>>>> message decoder and and a data writer.   We need to add a better
>> tutorial
>>>>> on how to do this, but it isn't that difficult. If you decide to go
>> down
>>>>> this path, you can always ask questions on this list. I try to make
>> sure
>>>>> each email gets answered. But it can take me a day or two.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Ken
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Aug 7, 2013, at 9:33 AM, [email protected] wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Hi all,
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> Over at the Wikimedia Foundation, we're trying to figure out the
>>>>> best way to do our ETL from Kafka into Hadoop.  We don't currently use
>> Avro
>>>>> and I'm not sure if we are going to.  I came across this post.
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> If the plan is to remove the hadoop-consumer from Kafka contrib, do
>>>>> you think we should not consider it as one of our viable options?
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> Thanks!
>>>>>> 
>>>>>>> -Andrew
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> --
>>>>>> 
>>>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Camus - Kafka ETL for Hadoop" group.
>>>>>> 
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to camus_etl+...@**googlegroups.com.
>>>>> 
>>>>>> 
>>>>>>> For more options, visit https://groups.google.com/**groups/opt_out
>> <https://groups.google.com/groups/opt_out>
>>>>> .
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>

Re: Kafka/Hadoop consumers and producers

Reply via email to