The ideal sequence should be: 1. Ingress using Kafka -> Validation and processing using Spark -> Write into any NoSql DB or Hive.
>From my recent experience, writing directly to HDFS can be slow depending on >the data format. Thanks JP From: Sudeep Singh Thakur [mailto:sudeepthaku...@gmail.com] Sent: 30 June 2017 09:26 To: Sidharth Kumar Cc: Maggy; common-u...@hadoop.apache.org Subject: Re: Kafka or Flume In your use Kafka would be better because you want some transformations and validations. Kind regards, Sudeep Singh Thakur On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <sidharthkumar2...@gmail.com> wrote: Hi, I have a requirement where I have all transactional data injestion into hadoop in real time and before storing the data into hadoop, process it to validate the data. If the data failed to pass validation process , it will not be stored into hadoop. The validation process also make use of historical data which is stored in hadoop. So, my question is which injestion tool will be best for this Kafka or Flume? Any suggestions will be a great help for me. Warm Regards Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn:www.linkedin.com/in/sidharthkumar2792