If it is metadata why would we not cache it before we perform the join? Regards Sab On 13-Nov-2015 10:27 pm, "Eran Medan" <ehrann.meh...@gmail.com> wrote:
> Hi > I'm looking for some benchmarks on joining data frames where most of the > data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is > still in RDBMS. I am only looking at the very first join before any caching > happens, and I assume there will be loss of parallelization because JDBCRDD > is probably bottlenecked on the max amount of parallel connection the > database server can hold. > > Are there any measurements / benchmarks that anyone did? > > > ᐧ >