On Sun, Jun 22, 2014 at 5:53 PM, Debasish Das <debasish.da...@gmail.com> wrote:
> 600s for Spark vs 5s for Redshift...The numbers look much different from > the amplab benchmark... > > https://amplab.cs.berkeley.edu/benchmark/ > > Is it like SSDs or something that's helping redshift or the whole data is > in memory when you run the query ? Could you publish the query ? > I think we'll blog it when it's done. Still working on it. This was done with HD nodes, not SSD. The query is very simple; select id, count(*) from data_table group by id; This is on 52.13 GB of gzipped data, with about 150 distinct IDs.