Could you please help me to understand the performance that we get from using spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings = 288 mil rows (Approx. 360 GB per day) – Therefore, we will end up with 10'sor 100's of TBs of data and I feel that NoSQL will be much quicker thanHadoop/Spark. This is time series data that are coming from many devices in form of flat files and it is currently extracted / transformed /loaded into another database which is connected to BI tools. We might use azure data factory to collect the flat files and then use spark to do the ETL(not sure if it is correct way) and then use spark to join table or do the aggregations and save them into a db (preferably nosql not sure). Finally, connect deploy Power BI to get visualize the data from nosql db. My questions are :
1- Is the above mentioned correct architecture? using spark with nosql as I think combination of these two could help to have random access and run many queries by different users. 2- do we really need to use a time series db? Best Regards ....................................................... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com