Running several spark actions in parallel

2017-07-31 Thread Guy Harmach
Hi, I need to run a batch job written in Java that executes several SQL statements on different hive tables, and then process each partition result set in a foreachPartition() operator. I'd like to run these actions in parallel. I saw there are two approaches for achieving this: 1. Using

Spark streaming graceful shutdown when running on yarn-cluster deploy-mode

2016-07-12 Thread Guy Harmach
Hi, I'm a newbie to spark, starting to work with Spark 1.5 using the Java API (about to upgrade to 1.6 soon). I am deploying a spark streaming application using spark-submit with yarn-cluster mode. What is the recommended way for performing graceful shutdown to the spark job? Already tried