Hi All, I have an RDD of case class objects.
scala> case class Entity( | value: String, | identifier: String | ) defined class Entity scala> Entity("hello", "id1") res25: Entity = Entity(hello,id1) During a map operation, I'd like to return a new RDD that contains all of the data of the original RDD with the addition of new data that was looked up based on the identifiers provided. The lookup table table in Cassandra looks something like... id | type -----+------------- id1 | action id2 | view The end result would be an RDD of EntityExtended case class EntityExtended( value: String, identifier: String type: String ) I believe that it would make sense to use a broadcast variable. However, I'm not sure what the best way would be to incorporate it during a map operation. rdd.map(MyObject.extendEntity) object MyObject { def extendEntity(entity: Entity): EntityExtended = { val id = entity.identifier // lookup identifier in broadcast variable? } } Thanks, Mike.