Re: Joining HDFS and JDBC data sources - benchmarks

2015-11-13 Thread Sabarish Sasidharan
If it is metadata why would we not cache it before we perform the join? Regards Sab On 13-Nov-2015 10:27 pm, "Eran Medan" wrote: > Hi > I'm looking for some benchmarks on joining data frames where most of the > data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is > still in RD

Joining HDFS and JDBC data sources - benchmarks

2015-11-13 Thread Eran Medan
Hi I'm looking for some benchmarks on joining data frames where most of the data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is still in RDBMS. I am only looking at the very first join before any caching happens, and I assume there will be loss of parallelization because JDBCRDD