Columnar-Oriented RDDs

Night Wolf Fri, 13 Feb 2015 07:40:11 -0800

Hi all,

I'd like to build/use column oriented RDDs in some of my Spark code. A
normal Spark RDD is stored as row oriented object if I understand
correctly.


I'd like to leverage some of the advantages of a columnar memory format.
Shark (used to) and SparkSQL uses a columnar storage format using primitive
arrays for each column.

I'd be interested to know more about this approach and how I could build my
own custom columnar-oriented RDD which I can use outside of Spark SQL.

Could anyone give me some pointers on where to look to do something like
this, either from scratch or using whats there in the SparkSQL libs or
elsewhere. I know Evan Chan in a presentation made mention of building a
custom RDD of column-oriented blocks of data.

Cheers,
~N

Columnar-Oriented RDDs

Reply via email to