it spark 1.5.1
the dataframe has simply 2 columns, both string
a sql query would be more efficient probably, but doesnt fit out purpose
(we are doing a lot more stuff where we need rdds).
also i am just trying to understand in general what in that rdd coming from
a dataframe could slow things
Can you please provide the high-level schema of the entities that you are
attempting to join? I think that you may be able to use a more efficient
technique to join these together; perhaps by registering the Dataframes as
temp tables and constructing a Spark SQL query.
Also, which version of