Re: rdd join very slow when rdd created from data frame

2016-01-12 Thread Koert Kuipers
it spark 1.5.1 the dataframe has simply 2 columns, both string a sql query would be more efficient probably, but doesnt fit out purpose (we are doing a lot more stuff where we need rdds). also i am just trying to understand in general what in that rdd coming from a dataframe could slow things

Re: rdd join very slow when rdd created from data frame

2016-01-12 Thread Kevin Mellott
Can you please provide the high-level schema of the entities that you are attempting to join? I think that you may be able to use a more efficient technique to join these together; perhaps by registering the Dataframes as temp tables and constructing a Spark SQL query. Also, which version of