Hi Spark devs, I was recently into a tech session about data processing with spark vs redshift which concluded with metrics and datapoint that for 2 Billion data, Select queries on data based on filters on attributes were faster and cheaper on AWS Redshift as compared to an AWS Spark cluster.
I have researched around a bit, and both Redshift and Spark seem to processing softwares where we want to do OLAP queries on a large dataset. I was wondering in which usecases does Spark has an edge over Redshift? Are there certain kind of Complex queries where Spark can outperform Redshift? Or does Redshift only work well with schema defined data? Please share your experience with either of the technologies. Thanks. Cheers, Eris.