Re: Parallelizing multiple RDD / DataFrame creation in Spark

Brandon White Wed, 08 Jul 2015 08:07:07 -0700

The point of running them in parallel would be faster creation of the
tables. Has anybody been able to efficiently parallelize something like
this in Spark?
On Jul 8, 2015 12:29 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:


> Whats the point of creating them in parallel? You can multi-thread it run
> it in parallel though.
>
> Thanks
> Best Regards
>
> On Wed, Jul 8, 2015 at 5:34 AM, Brandon White <bwwintheho...@gmail.com>
> wrote:
>
>> Say I have a spark job that looks like following:
>>
>> def loadTable1() {
>>   val table1 = sqlContext.jsonFile(s"s3://textfiledirectory/")
>>   table1.cache().registerTempTable("table1")}
>> def loadTable2() {
>>   val table2 = sqlContext.jsonFile(s"s3://testfiledirectory2/")
>>   table2.cache().registerTempTable("table2")}
>>
>> def loadAllTables() {
>>   loadTable1()
>>   loadTable2()}
>>
>> loadAllTables()
>>
>> How do I parallelize this Spark job so that both tables are created at
>> the same time or in parallel?
>>
>
>

Re: Parallelizing multiple RDD / DataFrame creation in Spark

Reply via email to