For fairly simple transformations, Flume is great, and works fine
subscribing
​to some pretty ​
high volumes of messages from Kafka
​ (I think we hit 50M/second at one point)​
. If you need to do complex transformations, e.g. database lookups for the
Kafka to Hadoop ETL, then you will start having complexity issues which
will exceed the capability of Flume.
​There are git repos that have everything you need, which include the kafka
adapter, hdfs writer, etc. A lot of this is built into flume. ​
I assume this might be a bit off topic, so googling flume & kafka will help
you?

On Thu, Jun 29, 2017 at 10:14 PM, Mallanagouda Patil <
mallanagouda.c.pa...@gmail.com> wrote:

> Kafka is capable of processing billions of events per second. You can
> scale it horizontally with Kafka broker servers.
>
> You can try out these steps
>
> 1. Create a topic in Kafka to get your all data. You have to use Kafka
> producer to ingest data into Kafka.
> 2. If you are going to write your own HDFS client to put data into HDFS
> then, you can read data from topic in step-1, validate and store into HDFS.
> 3. If you want to OpenSource tool (Gobbling or confluent Kafka HDFS
> connector) to put data into HDFS then
> Write tool to read data from topic, validate and store in other topic.
>
> We are using combination of these steps to process over 10 million
> events/second.
>
> I hope it helps..
>
> Thanks
> Mallan
>
> On Jun 30, 2017 10:31 AM, "Sidharth Kumar" <sidharthkumar2...@gmail.com>
> wrote:
>
>> Thanks! What about Kafka with Flume? And also I would like to tell that
>> everyday data intake is in millions and can't afford to loose even a single
>> piece of data. Which makes a need of  high availablity.
>>
>> Warm Regards
>>
>> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192
>> 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>>
>>
>>
>>
>>
>>
>> On 30-Jun-2017 10:04 AM, "JP gupta" <jp.gu...@altruistindia.com> wrote:
>>
>>> The ideal sequence should be:
>>>
>>> 1.      Ingress using Kafka -> Validation and processing using Spark ->
>>> Write into any NoSql DB or Hive.
>>>
>>> From my recent experience, writing directly to HDFS can be slow
>>> depending on the data format.
>>>
>>>
>>>
>>> Thanks
>>>
>>> JP
>>>
>>>
>>>
>>> *From:* Sudeep Singh Thakur [mailto:sudeepthaku...@gmail.com]
>>> *Sent:* 30 June 2017 09:26
>>> *To:* Sidharth Kumar
>>> *Cc:* Maggy; common-u...@hadoop.apache.org
>>> *Subject:* Re: Kafka or Flume
>>>
>>>
>>>
>>> In your use Kafka would be better because you want some transformations
>>> and validations.
>>>
>>> Kind regards,
>>> Sudeep Singh Thakur
>>>
>>>
>>>
>>> On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <sidharthkumar2...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have a requirement where I have all transactional data injestion into
>>> hadoop in real time and before storing the data into hadoop, process it to
>>> validate the data. If the data failed to pass validation process , it will
>>> not be stored into hadoop. The validation process also make use of
>>> historical data which is stored in hadoop. So, my question is which
>>> injestion tool will be best for this Kafka or Flume?
>>>
>>>
>>>
>>> Any suggestions will be a great help for me.
>>>
>>>
>>> Warm Regards
>>>
>>> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192
>>> 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>>>
>>>
>>>
>>>
>>>
>>>
>>

Reply via email to