Spark SQL parallelize

Selvam Raman Thu, 20 Oct 2016 11:43:05 -0700

Hi,

I am having 40+ structured data stored in s3 bucket as parquet file .


I am going to use 20 table in the use case.

There s a Main table which drive the whole flow. Main table contains 1k
record.

My use case is for every record in the main table process the rest of
table( join group by depends on main table field).

How can I parallel the process.

What I done was read the main table and create tocaliterator for df then do
the rest of the processing.
This one run one by one record.

Please share me your ideas.

Thank you.

Spark SQL parallelize

Reply via email to