If it is metadata why would we not cache it before we perform the join?

Regards
Sab
On 13-Nov-2015 10:27 pm, "Eran Medan" <ehrann.meh...@gmail.com> wrote:

> Hi
> I'm looking for some benchmarks on joining data frames where most of the
> data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
> still in RDBMS. I am only looking at the very first join before any caching
> happens, and I assume there will be loss of parallelization because JDBCRDD
> is probably bottlenecked on the max amount of parallel connection the
> database server can hold.
>
> Are there any measurements / benchmarks that anyone did?
>
>
> ᐧ
>

Reply via email to