On Thu, Jun 12, 2014 at 4:48 PM, Andre Schumacher < schum...@icsi.berkeley.edu> wrote:
> On 06/12/2014 05:47 PM, Toby Douglass wrote: > > > In these future jobs, when I come to load the aggregted RDD, will Spark > > load and only load the columns being accessed by the query? or will > Spark > > load everything, to convert it into an internal representation, and then > > execute the query? > > The aforementioned native Parquet support in Spark 1.0 supports column > projections which means only the columns that appear in the query will > be loaded. > [snip] Thankyou!