What is the message inflow ? If it's really high , definitely spark will be of great use .
Thanks Deepak On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote: > I have a somewhat tricky use case, and I'm looking for ideas. > > I have 5-6 Kafka producers, reading various APIs, and writing their raw > data into Kafka. > > I need to: > > - Do ETL on the data, and standardize it. > > - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS / > ElasticSearch / Postgres) > > - Query this data to generate reports / analytics (There will be a web UI > which will be the front-end to the data, and will show the reports) > > Java is being used as the backend language for everything (backend of the > web UI, as well as the ETL layer) > > I'm considering: > > - Using raw Kafka consumers, or Spark Streaming, as the ETL layer (receive > raw data from Kafka, standardize & store it) > > - Using Cassandra, HBase, or raw HDFS, for storing the standardized data, > and to allow queries > > - In the backend of the web UI, I could either use Spark to run queries > across the data (mostly filters), or directly run queries against Cassandra > / HBase > > I'd appreciate some thoughts / suggestions on which of these alternatives > I should go with (e.g, using raw Kafka consumers vs Spark for ETL, which > persistent data store to use, and how to query that data store in the > backend of the web UI, for displaying the reports). > > > Thanks. >