Re: Shark vs Impala

Toby Douglass Mon, 23 Jun 2014 05:30:56 -0700

On Sun, Jun 22, 2014 at 5:53 PM, Debasish Das <debasish.da...@gmail.com>
wrote:


> 600s for Spark vs 5s for Redshift...The numbers look much different from
> the amplab benchmark...
>
> https://amplab.cs.berkeley.edu/benchmark/
>
> Is it like SSDs or something that's helping redshift or the whole data is
> in memory when you run the query ? Could you publish the query ?
>

I think we'll blog it when it's done.  Still working on it.  This was done
with HD nodes, not SSD.

The query is very simple;

select id, count(*) from data_table group by id;

This is on 52.13 GB of gzipped data, with about 150 distinct IDs.

Re: Shark vs Impala

Reply via email to