[jira] [Updated] (SPARK-33088) Enhance ExecutorPlugin API to include methods for task start and end events

Samuel Souza (Jira) Wed, 07 Oct 2020 11:53:16 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-33088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Samuel Souza updated SPARK-33088:
---------------------------------
    Description: 
On [SPARK-24918|https://issues.apache.org/jira/browse/SPARK-24918]'s 
[SIPP|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#],
 it was raised to potentially add methods to ExecutorPlugin interface on task 
start and end:

{quote}The basic interface can just be a marker trait, as that allows a plugin 
to monitor general characteristics of the JVM (eg. monitor memory or take 
thread dumps).   Optionally, we could include methods for task start and end 
events.   This would allow more control on monitoring – eg., you could start 
polling thread dumps only if there was a task from a particular stage that had 
been taking too long. But anything task related is a bit trickier to decide the 
right api. Should the task end event also get the failure reason? Should those 
events get called in the same thread as the task runner, or in another thread?
{quote}

The ask is to add exactly that. I've put up a draft PR [in our fork of 
spark|https://github.com/palantir/spark/pull/713] and I'm happy to push it 
upstream. Also happy to receive comments on what's the right interface to 
expose - not opinionated on that front, tried to expose the simplest interface 
for now.

The main reason for this ask is to propagate tracing information from the 
driver to the executors 
([SPARK-21962|https://issues.apache.org/jira/browse/SPARK-21962] has some 
context). On [HADOOP-15566|https://issues.apache.org/jira/browse/HADOOP-15566] 
I see we're discussing how to add tracing to the Apache ecosystem, but my 
problem is slightly different: I want to use this interface to propagate 
tracing information to my framework of choice. If the Hadoop issue gets solved 
we'll have a framework to communicate tracing information inside the Apache 
ecosystem, but it's highly unlikely that all Spark users will use the same 
common framework. Therefore we should still provide plugin interfaces where the 
tracing information can be propagated appropriately.

To give more color, in our case the tracing information is [stored in a thread 
local|https://github.com/palantir/tracing-java/blob/4.9.0/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61],
 therefore it needs to be set in the same thread which is executing the task. 
[*]

While our framework is specific, I imagine such an interface could be useful in 
general. Happy to hear your thoughts about it.

[*] Something I did not mention was how to propagate the tracing information 
from the driver to the executors. For that I intend to use 1. the driver's 
localProperties, which 2. will be eventually propagated to the executors' 
TaskContext, which 3. I'll be able to access from the methods above.

  was:
On https://issues.apache.org/jira/browse/SPARK-24918's 
[SIPP|[https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#]],
 it was raised to potentially add methods to ExecutorPlugin interface on task 
start and end:
{quote}The basic interface can just be a marker trait, as that allows a plugin 
to monitor general characteristics of the JVM (eg. monitor memory or take 
thread dumps).   Optionally, we could include methods for task start and end 
events.   This would allow more control on monitoring -- eg., you could start 
polling thread dumps only if there was a task from a particular stage that had 
been taking too long. But anything task related is a bit trickier to decide the 
right api. Should the task end event also get the failure reason? Should those 
events get called in the same thread as the task runner, or in another thread?
{quote}
The ask is to add exactly that. I've put up a draft PR in our fork of spark 
[here| [https://github.com/palantir/spark/pull/713]] and I'm happy to push it 
upstream. Also happy to receive comments on what's the right interface to 
expose - not opinionated on that front, tried to expose the simplest interface 
for now.

The main reason for this ask is to propagate tracing information from the 
driver to the executors (https://issues.apache.org/jira/browse/SPARK-21962 has 
some context). On https://issues.apache.org/jira/browse/HADOOP-15566 I see 
we're discussing how to add tracing to the Apache ecosystem, but my problem is 
slightly different: I want to use this interface to propagate tracing 
information to my framework of choice. If the Hadoop issue gets solved we'll 
have a framework to communicate tracing information inside the Apache 
ecosystem, but it's highly unlikely that all Spark users will use the same 
common framework. Therefore we should still provide plugin interfaces where the 
tracing information can be propagated appropriately.

To give more color, in our case the tracing information is [stored in a thread 
local|[https://github.com/palantir/tracing-java/blob/develop/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61]],
 therefore it needs to be set in the same thread which is executing the task. 
[*]

While our framework is specific, I imagine such an interface could be useful in 
general. Happy to hear your thoughts about it.

[*] Something I did not mention was how to propagate the tracing information 
from the driver to the executors. For that I intend to use 1. the driver's 
localProperties, which 2. will be eventually propagated to the executors' 
TaskContext, which 3. I'll be able to access from the methods above.


> Enhance ExecutorPlugin API to include methods for task start and end events
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-33088
>                 URL: https://issues.apache.org/jira/browse/SPARK-33088
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Samuel Souza
>            Priority: Major
>
> On [SPARK-24918|https://issues.apache.org/jira/browse/SPARK-24918]'s 
> [SIPP|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#],
>  it was raised to potentially add methods to ExecutorPlugin interface on task 
> start and end:
> {quote}The basic interface can just be a marker trait, as that allows a 
> plugin to monitor general characteristics of the JVM (eg. monitor memory or 
> take thread dumps).   Optionally, we could include methods for task start and 
> end events.   This would allow more control on monitoring – eg., you could 
> start polling thread dumps only if there was a task from a particular stage 
> that had been taking too long. But anything task related is a bit trickier to 
> decide the right api. Should the task end event also get the failure reason? 
> Should those events get called in the same thread as the task runner, or in 
> another thread?
> {quote}
> The ask is to add exactly that. I've put up a draft PR [in our fork of 
> spark|https://github.com/palantir/spark/pull/713] and I'm happy to push it 
> upstream. Also happy to receive comments on what's the right interface to 
> expose - not opinionated on that front, tried to expose the simplest 
> interface for now.
> The main reason for this ask is to propagate tracing information from the 
> driver to the executors 
> ([SPARK-21962|https://issues.apache.org/jira/browse/SPARK-21962] has some 
> context). On 
> [HADOOP-15566|https://issues.apache.org/jira/browse/HADOOP-15566] I see we're 
> discussing how to add tracing to the Apache ecosystem, but my problem is 
> slightly different: I want to use this interface to propagate tracing 
> information to my framework of choice. If the Hadoop issue gets solved we'll 
> have a framework to communicate tracing information inside the Apache 
> ecosystem, but it's highly unlikely that all Spark users will use the same 
> common framework. Therefore we should still provide plugin interfaces where 
> the tracing information can be propagated appropriately.
> To give more color, in our case the tracing information is [stored in a 
> thread 
> local|https://github.com/palantir/tracing-java/blob/4.9.0/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61],
>  therefore it needs to be set in the same thread which is executing the task. 
> [*]
> While our framework is specific, I imagine such an interface could be useful 
> in general. Happy to hear your thoughts about it.
> [*] Something I did not mention was how to propagate the tracing information 
> from the driver to the executors. For that I intend to use 1. the driver's 
> localProperties, which 2. will be eventually propagated to the executors' 
> TaskContext, which 3. I'll be able to access from the methods above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33088) Enhance ExecutorPlugin API to include methods for task start and end events

Reply via email to