Well yes, MLlib-like routines or pretty much anything else could be run on the 
derived results, but you have to unload the results from Redshift and then load 
them into some other tool. So it's nicer to leave them in memory and operate on 
them there. Major architectural advantage to Spark.

Ron


From: Gary Malouf [mailto:malouf.g...@gmail.com]
Sent: Wednesday, August 06, 2014 1:17 PM
To: Nicholas Chammas
Cc: Daniel, Ronald (ELS-SDG); user@spark.apache.org
Subject: Re: Regarding tooling/performance vs RedShift


Also, regarding something like redshift not having MLlib built in, much of that 
could be done on the derived results.
On Aug 6, 2014 4:07 PM, "Nicholas Chammas" 
<nicholas.cham...@gmail.com<mailto:nicholas.cham...@gmail.com>> wrote:
On Wed, Aug 6, 2014 at 3:41 PM, Daniel, Ronald 
(ELS-SDG)<r.dan...@elsevier.com<mailto:r.dan...@elsevier.com>> wrote:
Mostly I was just objecting to " Redshift does very well, but Shark is on par 
or better than it in most of the tests " when that was not how I read the 
results, and Redshift was on HDDs.

My bad. You are correct; the only test Shark (mem) does better on is test #1 
"Scan Query".

And indeed, it would be good to see an updated benchmark with Redshift running 
on SSDs.

Nick

Reply via email to