On Fri, Apr 25, 2014 at 6:30 AM, Mark Baker <dist...@acm.org> wrote: > I've only had a quick look at Pig, but it seems that a declarative > layer on top of Spark couldn't be anything other than a big win, as it > allows developers to declare *what* they want, permitting the compiler > to determine how best poke at the RDD API to implement it. >
Having Pig too would certainly be a win, but Spark SQL<http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html>is also a declarative layer on top of Spark. Since the optimization is lazy, you can chain multiple SQL statements in a row and still optimize them holistically (similar to a pig job). Alpha version coming soon to a Spark 1.0 release near you! Spark SQL also lets to drop back into functional Scala when that is more natural for a particular task.