Re: Reading Hive tables Parallel in Spark

Fretz Nuson Mon, 17 Jul 2017 08:46:36 -0700

I was getting NullPointerException when trying to call SparkSQL from
foreach. After debugging, i got to know spark session is not available in
executor and could not successfully pass it.


What i am doing is  tablesRDD.foreach.collect() and it works but goes
sequential

On Mon, Jul 17, 2017 at 5:58 PM, Simon Kitching <
simon.kitch...@unbelievable-machine.com> wrote:

> Have you tried simply making a list with your tables in it, then using
> SparkContext.makeRDD(Seq)? ie
>
> val tablenames = List("table1", "table2", "table3", ...)
> val tablesRDD = sc.makeRDD(tablenames, nParallelTasks)
> tablesRDD.foreach(....)
>
> > Am 17.07.2017 um 14:12 schrieb FN <nuson.fr...@gmail.com>:
> >
> > Hi
> > I am currently trying to parallelize reading multiple tables from Hive .
> As
> > part of an archival framework, i need to convert few hundred tables which
> > are in txt format to Parquet. For now i am calling a Spark SQL inside a
> for
> > loop for conversion. But this execute sequential and entire process takes
> > longer time to finish.
> >
> > I tired  submitting 4 different Spark jobs ( each having set of tables
> to be
> > converted), it did give me some parallelism , but i would like to do
> this in
> > single Spark job due to few limitation of our cluster and process
> >
> > Any help will be greatly appreciated
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Reading-Hive-tables-Parallel-in-Spark-tp28869.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>

Re: Reading Hive tables Parallel in Spark

Reply via email to