I was getting NullPointerException when trying to call SparkSQL from foreach. After debugging, i got to know spark session is not available in executor and could not successfully pass it.
What i am doing is tablesRDD.foreach.collect() and it works but goes sequential On Mon, Jul 17, 2017 at 5:58 PM, Simon Kitching < simon.kitch...@unbelievable-machine.com> wrote: > Have you tried simply making a list with your tables in it, then using > SparkContext.makeRDD(Seq)? ie > > val tablenames = List("table1", "table2", "table3", ...) > val tablesRDD = sc.makeRDD(tablenames, nParallelTasks) > tablesRDD.foreach(....) > > > Am 17.07.2017 um 14:12 schrieb FN <nuson.fr...@gmail.com>: > > > > Hi > > I am currently trying to parallelize reading multiple tables from Hive . > As > > part of an archival framework, i need to convert few hundred tables which > > are in txt format to Parquet. For now i am calling a Spark SQL inside a > for > > loop for conversion. But this execute sequential and entire process takes > > longer time to finish. > > > > I tired submitting 4 different Spark jobs ( each having set of tables > to be > > converted), it did give me some parallelism , but i would like to do > this in > > single Spark job due to few limitation of our cluster and process > > > > Any help will be greatly appreciated > > > > > > > > > > > > -- > > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Reading-Hive-tables-Parallel-in-Spark-tp28869.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > >