Hi All, Just started understanding / getting hands on with Spark, Streaming and MLLIb. We are in the design phase and need suggestions on the training data storage requirement.
Batch Layer: Our core systems generate data which we will be using as batch data, currently SQL server is being used by core systems. Our requirement is to pull data from core databases and transform the data using spark job and store it into Cassandra. Train the model by pulling data from Cassandra and store the prediction results in the Cassandra itself. Real time Layer: We are also planning have real time layer which stores live data from devices to Cassandra for further analysis using MLLib. Heard that there is no need of Cassandra in this design as Spark itself provides storage. Please provide suggestions whether Cassandra is required or not and also suggest best way to handle: [cid:image002.png@01D11E32.A9B207B0] Aruna Veluru | Senior Lead Analyst | Bally Technologies<http://www.ballytech.com> | (O) +1 702 532 2832 | (M) +91 99 7222 6213 May be privileged. May be confidential. Please delete if not the addressee. Veluru Veluru
image001.emz
Description: image001.emz