Re: initial basic question from new user

Andre Schumacher Thu, 12 Jun 2014 08:49:32 -0700

Hi,

On 06/12/2014 05:47 PM, Toby Douglass wrote:


> In these future jobs, when I come to load the aggregted RDD, will Spark
> load and only load the columns being accessed by the query?  or will Spark
> load everything, to convert it into an internal representation, and then
> execute the query?

The aforementioned native Parquet support in Spark 1.0 supports column
projections which means only the columns that appear in the query will
be loaded. The next release will also support record filters for simple
pruning predicates ("int-column smaller value" and such). This is
different from using a Hadoop Input/Output format and requires no
additional setup (jars in classpath and such).

For more details see:

http://spark.apache.org/docs/latest/sql-programming-guide.html#using-parquet

Andre

Re: initial basic question from new user

Reply via email to