Hi,

I have the following usecase, assuming that I have my data in e.g. hdfs, a 
single file sequence file containing rows of CSV entries that I can split and 
build an RDD of arrays of (smaller) strings.
What I want to do is to build two RDDs where the first RDD contains a subset of 
columns and the second RDD contains another subset.
Is there a map like API that could do this trick ?

BTW - I know that one can iteratively build multiple flows that would call map 
and select the proper columns. Is there any faster way ?

Sagi


Reply via email to