Thanks. That seems to work great, except EMR doesn't always copy the logs
to S3. The behavior seems inconsistent and I am debugging it now.
On Fri, Mar 31, 2017 at 7:46 AM, Vadim Semenov
wrote:
> You can provide your own log directory, where Spark log will be saved, and
> that you could replay
Modifying spark.eventLog.dir to point to a S3 path, you will encounter the
following exception in Spark history log on path:
/var/log/spark/spark-history-server.out
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
a
You can provide your own log directory, where Spark log will be saved, and
that you could replay afterwards.
Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
run it.
Note! The path `s3://bucket/some/directory` must exist before you run your
job, it'll not be created automa