Hi,

I need to run a batch job written in Java that executes several SQL statements 
on different hive tables, and then process each partition result set in a 
foreachPartition() operator.
I'd like to run these actions in parallel.
I saw there are two approaches for achieving this:

1.       Using the java.util.concurrent package e.g. Future/ForkJoinPool

2.        Transforming my Dataset to JavaRDD<Row> and using the 
foreachPartitionAsync() on the RDD.

Can you please recommend the best way to achieve this using one of these 
options, or suggest a better approach?

Thanks, Guy
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>

Reply via email to