Andrew Or created SPARK-8987:
--------------------------------

             Summary: Increase test coverage of DAGScheduler
                 Key: SPARK-8987
                 URL: https://issues.apache.org/jira/browse/SPARK-8987
             Project: Spark
          Issue Type: Bug
          Components: Scheduler, Tests
    Affects Versions: 1.0.0
            Reporter: Andrew Or


DAGScheduler is one of the most monstrous piece of code in Spark. Every time 
someone changes something there something like the following happens:

(1) Someone pings a committer
(2) The committer pings a scheduler maintainer
(3) Scheduler maintainer correctly points out bugs in the patch
(4) Author of patch fixes bug but introduces more bugs
(5) Repeat steps 3 - 4 N times
(6) Other committers / contributors jump in and start debating
(7) The patch goes stale for months

All of this happens because no one, including the committers, has high 
confidence that a particular change doesn't break some corner case in the 
scheduler. I believe one of the main issues is the lack of sufficient test 
coverage, which is not a luxury but a necessity for logic as complex as the 
DAGScheduler.

As of the writing of this JIRA, DAGScheduler has ~1500 lines, while the 
DAGSchedulerSuite only has ~900 lines. I would argue that the suite line count 
should actually be many multiples of that of the original code.

If you wish to work on this, let me know and I will assign it to you. Anyone is 
welcome. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to