In the below code you are impeding Spark from doing what is meant to do.As
mentioned below, the best (and easiest to implement) aproach would be to load
each file into a dataframe and join between them.Even doing a key join with
RDDS would be better, but in your case you are forcing a one by
Hi,
Why are you using RDDs? And how are the files stored in terms if
compression?
Regards
Gourav
On Sat, 25 Apr 2020, 08:54 Roland Johann,
wrote:
> You can read both, the logs and the tree file into dataframes and join
> them. Doing this spark can distribute the relevant records or even the
>
unsubscribe
Hi,
This is something that I am banging my head to explain someone so hence
this question
There is an underlying Hive that which is populated reading an XML file
through Spark.
The table in Hive is defined as follows with two partition columns
var sqltext =
s"""
CREATE