Thanks. That seems to work great, except EMR doesn't always copy the logs
to S3. The behavior  seems inconsistent and I am debugging it now.

On Fri, Mar 31, 2017 at 7:46 AM, Vadim Semenov <>

> You can provide your own log directory, where Spark log will be saved, and
> that you could replay afterwards.
> Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
> run it.
> Note! The path `s3://bucket/some/directory` must exist before you run your
> job, it'll not be created automatically.
> The Spark HistoryServer on EMR won't show you anything because it's
> looking for logs in `hdfs:///var/log/spark/apps` by default.
> After that you can either copy the log files from s3 to the hdfs path
> above, or you can copy them locally to `/tmp/spark-events` (the default
> directory for spark logs) and run the history server like:
> ```
> cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
> sbin/
> ```
> and then open http://localhost:18080
> On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay <>
> wrote:
>> I am looking for tips on evaluating my Spark job after it has run.
>> I know that right now I can look at the history of jobs through the web
>> ui. I also know how to look at the current resources being used by a
>> similar web ui.
>> However, I would like to look at the logs after the job is finished to
>> evaluate such things as how many tasks were completed, how many executors
>> were used, etc. I currently save my logs to S3.
>> Thanks!
>> Henry
>> --
>> Paul Henry Tremblay
>> Robert Half Technology

Paul Henry Tremblay
Robert Half Technology

Reply via email to