Re: Reading Hive tables Parallel in Spark

Simon Kitching Mon, 17 Jul 2017 05:29:08 -0700

Have you tried simply making a list with your tables in it, then using 
SparkContext.makeRDD(Seq)? ie


val tablenames = List("table1", "table2", "table3", ...)
val tablesRDD = sc.makeRDD(tablenames, nParallelTasks)
tablesRDD.foreach(....)

> Am 17.07.2017 um 14:12 schrieb FN <nuson.fr...@gmail.com>:
> 
> Hi
> I am currently trying to parallelize reading multiple tables from Hive . As
> part of an archival framework, i need to convert few hundred tables which
> are in txt format to Parquet. For now i am calling a Spark SQL inside a for
> loop for conversion. But this execute sequential and entire process takes
> longer time to finish.
> 
> I tired  submitting 4 different Spark jobs ( each having set of tables to be
> converted), it did give me some parallelism , but i would like to do this in
> single Spark job due to few limitation of our cluster and process
> 
> Any help will be greatly appreciated 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Hive-tables-Parallel-in-Spark-tp28869.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Reading Hive tables Parallel in Spark

Reply via email to