Could you please help me to understand  the performance that we get from using 
spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings = 288 mil 
rows (Approx. 360 GB per day) – Therefore, we will end up with 10'sor 100's of 
TBs of data and I feel that NoSQL will be much quicker thanHadoop/Spark. This 
is time series data that are coming from many devices in form of flat files and 
it is currently extracted / transformed /loaded  into another database which is 
connected to BI tools. We might use azure data factory to collect the flat 
files and then use spark to do the ETL(not sure if it is correct way) and then 
use spark to join table or do the aggregations and save them into a db 
(preferably nosql not sure). Finally, connect deploy Power BI to get visualize 
the data from nosql db. My questions are :

1- Is the above mentioned correct architecture? using spark with nosql as I 
think combination of these two could help to have random access and run many 
queries by different users. 2- do we really need to use a time series db? 

Best Regards ....................................................... Amin 
Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel 
: +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my               
amin_...@me.com

Reply via email to