This may seem contrived but, suppose I wanted to create a collection of "single column" RDD's that contain calculated values, so I want to cache these to avoid re-calc.
i.e. rdd1 = {Names] rdd2 = {Star Sign} rdd3 = {Age}Then I want to create a new virtual RDD that is a collection of these RDD's to create a "multi-column" RDD
rddA = {Names, Age} rddB = {Names, Star Sign} I saw that rdd.union() merges rows, but anything that can combine columns? Cheers - Ian