The ideal sequence should be:

1.      Ingress using Kafka -> Validation and processing using Spark -> Write 
into any NoSql DB or Hive.  

>From my recent experience, writing directly to HDFS can be slow depending on 
>the data format.

 

Thanks

JP 

 

From: Sudeep Singh Thakur [mailto:sudeepthaku...@gmail.com] 
Sent: 30 June 2017 09:26
To: Sidharth Kumar
Cc: Maggy; common-u...@hadoop.apache.org
Subject: Re: Kafka or Flume

 

In your use Kafka would be better because you want some transformations and 
validations.

Kind regards,
Sudeep Singh Thakur

 

On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <sidharthkumar2...@gmail.com> wrote:

Hi,

 

I have a requirement where I have all transactional data injestion into hadoop 
in real time and before storing the data into hadoop, process it to validate 
the data. If the data failed to pass validation process , it will not be stored 
into hadoop. The validation process also make use of historical data which is 
stored in hadoop. So, my question is which injestion tool will be best for this 
Kafka or Flume?

 

Any suggestions will be a great help for me.


Warm Regards

Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  
LinkedIn:www.linkedin.com/in/sidharthkumar2792




    

Reply via email to