unsubscribe

2020-04-26 Thread siqi chen

Re: [pyspark] Load a master data file to spark ecosystem

2020-04-26 Thread Edgardo Szrajber
In the below  code you are impeding Spark from doing what is meant to do.As mentioned below, the best (and easiest to implement) aproach would be to load each file into a dataframe and join between them.Even doing a key join with RDDS would be better, but in your case you are forcing a one by

Re: [pyspark] Load a master data file to spark ecosystem

2020-04-26 Thread Gourav Sengupta
Hi, Why are you using RDDs? And how are the files stored in terms if compression? Regards Gourav On Sat, 25 Apr 2020, 08:54 Roland Johann, wrote: > You can read both, the logs and the tree file into dataframes and join > them. Doing this spark can distribute the relevant records or even the >

unsubscribe

2020-04-26 Thread David Aspegren
unsubscribe

Static and dynamic partition loads in Hive table through Spark

2020-04-26 Thread Mich Talebzadeh
Hi, This is something that I am banging my head to explain someone so hence this question There is an underlying Hive that which is populated reading an XML file through Spark. The table in Hive is defined as follows with two partition columns var sqltext = s""" CREATE