date:20200426

In the below code you are impeding Spark from doing what is meant to do.As mentioned below, the best (and easiest to implement) aproach would be to load each file into a dataframe and join between them.Even doing a key join with RDDS would be better, but in your case you are forcing a one by

Re: [pyspark] Load a master data file to spark ecosystem

2020-04-26 Thread Gourav Sengupta

Hi, Why are you using RDDs? And how are the files stored in terms if compression? Regards Gourav On Sat, 25 Apr 2020, 08:54 Roland Johann, wrote: > You can read both, the logs and the tree file into dataframes and join > them. Doing this spark can distribute the relevant records or even the >

unsubscribe

2020-04-26 Thread David Aspegren

unsubscribe

Static and dynamic partition loads in Hive table through Spark

2020-04-26 Thread Mich Talebzadeh

Hi, This is something that I am banging my head to explain someone so hence this question There is an underlying Hive that which is populated reading an XML file through Spark. The table in Hive is defined as follows with two partition columns var sqltext = s""" CREATE

unsubscribe

Re: [pyspark] Load a master data file to spark ecosystem

Re: [pyspark] Load a master data file to spark ecosystem

unsubscribe

Static and dynamic partition loads in Hive table through Spark

5 matches

Site Navigation

Mail list logo

Footer information