If it is metadata why would we not cache it before we perform the join?
Regards
Sab
On 13-Nov-2015 10:27 pm, "Eran Medan" wrote:
> Hi
> I'm looking for some benchmarks on joining data frames where most of the
> data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
> still in RD
Hi
I'm looking for some benchmarks on joining data frames where most of the
data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
still in RDBMS. I am only looking at the very first join before any caching
happens, and I assume there will be loss of parallelization because JDBCRDD