[ 
https://issues.apache.org/jira/browse/SPARK-26439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-26439:
-------------------------
    Description: 
Currently, Barrier TaskSet has a hard requirement that tasks can only be 
launched
 in a single resourceOffers round with enough slots(or sufficient resources), 
but
 can not be guaranteed even if with enough slots due to task locality delay 
scheduling.
 So, it is very likely that Barrier TaskSet gets a chunk of sufficient 
resources after
 all the trouble, but let it go easily just beacuae one of pending tasks can 
not be
 scheduled. Futhermore, it causes severe resource competition between TaskSets 
and jobs
 and introduce unclear semantic for DynamicAllocation.

This JIRA trys to introduce WorkOffer reservation mechanism for Barrier 
TaskSet, which
 allows Barrier TaskSet to reserve WorkOffer in each resourceOffers round, and 
launch
 tasks at the same time once it accumulate the sufficient resource. In this 
way, we 
 relax the requirement of resources for the Barrier TaskSet. To avoid the 
deadlock which
 may be introuduced by serveral Barrier TaskSets holding the reserved WorkOffer 
for a
 long time, we'll ask Barrier TaskSets to force releasing part of reserved 
WorkOffers
 on demand. So, it is highly possible that each Barrier TaskSet would be 
launched in the
 end.

To integrate with DynamicAllocation

The possible effective way I can imagine is that adding new event, e.g. 
 ExecutorReservedEvent, ExecutorReleasedEvent, which behaved like busy executor 
with
 running tasks or idle executor without running tasks. Thus, 
ExecutionAllocationManager 
 would not let the executor go if it reminds of there're some reserved resource 
on that
 executor.

  was:
Currently, Barrier TaskSet has a hard requirement that tasks can only be 
launched
in a single resourceOffers round with enough slots(or sufficient resources), but
can not be guaranteed even if with enough slots due to task locality delay 
scheduling.
So, it is very likely that Barrier TaskSet gets a chunk of sufficient resources 
after
all the trouble, but let it go easily just beacuae one of pending tasks can not 
be
scheduled. Futhermore, it causes severe resource competition between TaskSets 
and jobs
and introduce unclear semantic for DynamicAllocation.

This pr trys to introduce WorkOffer reservation mechanism for Barrier TaskSet, 
which
allows Barrier TaskSet to reserve WorkOffer in each resourceOffers round, and 
launch
tasks at the same time once it accumulate the sufficient resource. In this way, 
we 
relax the requirement of resources for the Barrier TaskSet. To avoid the 
deadlock which
may be introuduced by serveral Barrier TaskSets holding the reserved WorkOffer 
for a
long time, we'll ask Barrier TaskSets to force releasing part of reserved 
WorkOffers
on demand. So, it is highly possible that each Barrier TaskSet would be 
launched in the
end.

To integrate with DynamicAllocation

The possible effective way I can imagine is that adding new event, e.g. 
ExecutorReservedEvent, ExecutorReleasedEvent, which behaved like busy executor 
with
running tasks or idle executor without running tasks. Thus, 
ExecutionAllocationManager 
would not let the executor go if it reminds of there're some reserved resource 
on that
executor.


> Introduce WorkOffer reservation mechanism for Barrier TaskSet
> -------------------------------------------------------------
>
>                 Key: SPARK-26439
>                 URL: https://issues.apache.org/jira/browse/SPARK-26439
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: wuyi
>            Priority: Major
>              Labels: performance
>             Fix For: 2.4.0
>
>
> Currently, Barrier TaskSet has a hard requirement that tasks can only be 
> launched
>  in a single resourceOffers round with enough slots(or sufficient resources), 
> but
>  can not be guaranteed even if with enough slots due to task locality delay 
> scheduling.
>  So, it is very likely that Barrier TaskSet gets a chunk of sufficient 
> resources after
>  all the trouble, but let it go easily just beacuae one of pending tasks can 
> not be
>  scheduled. Futhermore, it causes severe resource competition between 
> TaskSets and jobs
>  and introduce unclear semantic for DynamicAllocation.
> This JIRA trys to introduce WorkOffer reservation mechanism for Barrier 
> TaskSet, which
>  allows Barrier TaskSet to reserve WorkOffer in each resourceOffers round, 
> and launch
>  tasks at the same time once it accumulate the sufficient resource. In this 
> way, we 
>  relax the requirement of resources for the Barrier TaskSet. To avoid the 
> deadlock which
>  may be introuduced by serveral Barrier TaskSets holding the reserved 
> WorkOffer for a
>  long time, we'll ask Barrier TaskSets to force releasing part of reserved 
> WorkOffers
>  on demand. So, it is highly possible that each Barrier TaskSet would be 
> launched in the
>  end.
> To integrate with DynamicAllocation
> The possible effective way I can imagine is that adding new event, e.g. 
>  ExecutorReservedEvent, ExecutorReleasedEvent, which behaved like busy 
> executor with
>  running tasks or idle executor without running tasks. Thus, 
> ExecutionAllocationManager 
>  would not let the executor go if it reminds of there're some reserved 
> resource on that
>  executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to