The web UI is actually the speed layer, it needs to be able to query the data online, and show the results in real-time.
It also needs a custom front-end, so a system like Tableau can't be used, it must have a custom backend + front-end. Thanks for the recommendation of Flume. Do you think this will work: - Spark Streaming to read data from Kafka - Storing the data on HDFS using Flume - Using Spark to query the data in the backend of the web UI? On Thu, Sep 29, 2016 at 7:08 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > You need a batch layer and a speed layer. Data from Kafka can be stored on > HDFS using flume. > > - Query this data to generate reports / analytics (There will be a web UI > which will be the front-end to the data, and will show the reports) > > This is basically batch layer and you need something like Tableau or > Zeppelin to query data > > You will also need spark streaming to query data online for speed layer. > That data could be stored in some transient fabric like ignite or even > druid. > > HTH > > > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 29 September 2016 at 15:01, Ali Akhtar <ali.rac...@gmail.com> wrote: > >> It needs to be able to scale to a very large amount of data, yes. >> >> On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma <deepakmc...@gmail.com> >> wrote: >> >>> What is the message inflow ? >>> If it's really high , definitely spark will be of great use . >>> >>> Thanks >>> Deepak >>> >>> On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote: >>> >>>> I have a somewhat tricky use case, and I'm looking for ideas. >>>> >>>> I have 5-6 Kafka producers, reading various APIs, and writing their raw >>>> data into Kafka. >>>> >>>> I need to: >>>> >>>> - Do ETL on the data, and standardize it. >>>> >>>> - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS / >>>> ElasticSearch / Postgres) >>>> >>>> - Query this data to generate reports / analytics (There will be a web >>>> UI which will be the front-end to the data, and will show the reports) >>>> >>>> Java is being used as the backend language for everything (backend of >>>> the web UI, as well as the ETL layer) >>>> >>>> I'm considering: >>>> >>>> - Using raw Kafka consumers, or Spark Streaming, as the ETL layer >>>> (receive raw data from Kafka, standardize & store it) >>>> >>>> - Using Cassandra, HBase, or raw HDFS, for storing the standardized >>>> data, and to allow queries >>>> >>>> - In the backend of the web UI, I could either use Spark to run queries >>>> across the data (mostly filters), or directly run queries against Cassandra >>>> / HBase >>>> >>>> I'd appreciate some thoughts / suggestions on which of these >>>> alternatives I should go with (e.g, using raw Kafka consumers vs Spark for >>>> ETL, which persistent data store to use, and how to query that data store >>>> in the backend of the web UI, for displaying the reports). >>>> >>>> >>>> Thanks. >>>> >>> >> >