Hey,

I am trying to work out what is the best way we can leverage Spark for
crunching data that is sitting in SQL Server databases.
Ideal scenario is being able to efficiently work with big data (10billion+
rows of activity data).  We need to shape this data for machine learning
problems and want to do ad-hoc & complex queries and get results in timely
manner.

All our data crunching is done via SQL/MDX queries, but these obviously
take a very long time to run over large data size. Also we currently don't
have hadoop or any other distributed storage.

Keen to hear feedback/thoughts/war stories from the Spark community on best
way to approach this situation.

Thanks
Suhel

Reply via email to