Yes Amin
Spark is primarily being used for ETL.
Once you transform , you can store it in any nosql DBs that support use
case.
The BI dashboard app can further connect to the nosql DB for reports and
visualization.

HTH

Deepak.

On Mon, May 28, 2018, 05:47 amin mohebbi <aminn_...@yahoo.com.invalid>
wrote:

>  I am working on analytic application using Apache Spark to store and
> analyze data. Spark might be used as a ETL application to aggregate
> different metrics and then join with the aggregated metrics. The data
> sources are flat files that are coming from two different sources(interval
> meter data and customer information) on a daily basis(65Gb per day - time
> series data). The end users are BI users, so we cannot provide them
> notebook visualization. They only can use  Power BI , Tableua or Excel to
> do self service filters for run time analytics, graphing the data and
> reporting.
>
> So, my question is that what is the best tools to implement this pipeline?
> I do not think storing parquet or orc in file system is a good choice in
> production, and I think we have to deposit the data somewhere (time series
> or standard db) , please correct me if  I am wrong.
>
> 1- where to store the data? files system/time series db/azure cosmos /
> standard db?
> 2- Is it right way to do to use spark as to  etl and aggregation
> application , store it somewhere and use power bi for reporting and
> dashboard purposes?
> Best Regards ....................................................... Amin
> Mohebbi PhD candidate in Software Engineering   at university of Malaysia
> Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my
> amin_...@me.com
>

Reply via email to