[Twisted-Python] resolving release management conflict

Glyph Mon, 01 Feb 2021 13:00:15 -0800

Hello Twisted community!

Recently we've had some emotional language around the release process.  On 
https://github.com/twisted/twisted/pull/1464 
<https://github.com/twisted/twisted/pull/1464>, Craig said several times that 
he found Adi's work and process changes "very aggressive".  Obviously, to have 
suggested this, Craig would have found some things Adi said to be emotionally 
loaded as well.


I think we need to resolve this conflict in order to move forward productively 
and avoid wasting more volunteer energy on negative feelings.  In order to do 
that, I think we need to centralize the discussion here.

At this point I think we need to all begin with a presumption of good faith.  
Adi, Thomas Grainger, Tom Most, Maarten, Povilas and Craig are all trying to 
address big problems in Twisted.  Nobody's trying to be aggressive, even if 
it's coming across that way.

(Personally, I should acknowledge that I don't see Adi's actions as aggressive, 
but I think we have to address Craig's perception that they are.  Part of the 
presumption of good faith is presuming that everyone has valid reasons for 
their reactions.)

Here's the status as I understand it:

Adi, Thomas, and Maarten are trying to address release automation issues to 
make future releases go faster.  They've merged several changes, including 
https://github.com/twisted/twisted/pull/1464 
<https://github.com/twisted/twisted/pull/1464>, to that end.
Craig is trying to get out a release according to the current process, which 
means addressing https://twistedmatrix.com/trac/ticket/10069 
<https://twistedmatrix.com/trac/ticket/10069> as it is a release blocker.  He 
has a fix for this, but that fix is blocked.

While we've been waiting on #2, the changes in #1 proceed apace.  Craig's also 
noted several times that I've given permission for him to do the release and so 
he feels that breaking things mid-way is reneging on a promise from the project.

I believe the proximate cause of the conflict here is that we're dealing with 
this regression incorrectly.  When a regression is introduced, there's a 
process for dealing with it, originally documented here: 
https://twistedmatrix.com/trac/wiki/ReviewProcess#Revertingachange 
<https://twistedmatrix.com/trac/wiki/ReviewProcess#Revertingachange>. [1]  The 
principle we need to outline clearly is this: if we have a regression, the 
change that introduced it should be reverted as soon as possible.  If we can't 
revert it directly because of changes to trunk in the meanwhile that introduce 
conflicts, ideally the change that we would make would be the closest possible 
thing to reverting it, even if that reintroduces a known bug that we have to 
reopen.

In the case where a lot of work has happened in the interim, and it's less work 
to just "fix forward" on top of trunk, it's fine to do a small fix as an end 
run around this process.  The goal of this process is to minimize the work 
required to restore trunk to a releasable state.

Right now we have three competing fixes which have all been flagged as 
incorrect in various ways:

from Tom: https://github.com/twisted/twisted/pull/1499 
<https://github.com/twisted/twisted/pull/1499>
Craig rejected this one by saying it will break "certain versions" of Python, 
but I don't understand why it was closed rather than reviewed / reassigned as 
normal.  (Or for that matter which versions are the problems, or how it breaks 
them.)
from Adi: https://github.com/twisted/twisted/pull/1501 
<https://github.com/twisted/twisted/pull/1501>
Craig rejected this one by saying "please stop working on this" but it also 
doesn't have any technical reason that this solution is incorrect.
from Craig: https://github.com/twisted/twisted/pull/1502 
<https://github.com/twisted/twisted/pull/1502>
This was originally given a passing review by Povilas from the Buildbot 
project, but twm added an additional blocking review shortly thereafter 
explaining that it introduced an additional regression.  It seems as though 
this one is now actively being worked on?

Craig: you closed two of these issues on the basis that you were "already 
working on a fix", but that's not a valid reason to close a PR.  Or, for that 
matter, to give it a failing review.  In the future, please comment on PRs that 
you don't want merged by pointing out their technical insufficiency, so that we 
can take the opportunity to learn for the benefit of future maintenance.  In 
this case, the underlying issue is clearly quite nuanced and requires expertise 
that touches platform differences in Python's core APIs, so having multiple 
people working on different fixes is totally fine; this effort is not wasted if 
everybody learns a little bit about the code in question, and it's definitely 
not wasted if we have to synthesize our approaches in the end anyway because 
they address different cases.

If you are pressed for time and don't have the ability to enumerate a large 
number of subtle problems, it's fine to post a changes-required review saying 
something high-level, such as "this does not address a number of issues that 
I've taken into account in my fix in #NNNN; please see the code on that PR for 
more detail, I can answer specific questions later".

All of these fixes have each been outstanding for weeks, which strongly 
suggests that this is way too hard to fix forward and we should have stopped 
trying a while ago.  We should fall back to the previous one if in any way 
possible, then reintroduce the changes which caused it.  If we're really just 
about to land 1502, then feel free to complete the work there, but in the 
future we should regard this kind of delay in reverting as a process failure.

If there are too many conflicts to address in a revert, and reverting means we 
need to revert 3 or 4 tickets which have built on top of that work, then let's 
just do that and get those re-merged later as well.  It's unfortunate to ship 
without features but it's better than multi-month release hold-ups.

(Note: a straight revert doesn't require a full code review.  If you can revert 
a merge and CI is happy with it, feel free to get an administrator to smash the 
red button to merge immediately.)

I think a secondary problem is that we have not traditionally been very clear 
about the role of "release manager".  My intent for the role was originally 
that it last 1-2 weeks at most, and the responsibility is supposed to begin 
just a bit before the first prerelease is uploaded, and end when the final 
release is completed and the release announcement is written.

Given that it's the release manager's responsibility to ensure no regressions 
are present in the release, this can come along with a little bit of soft power 
where we should all promptly honor the current RM's request to revert things or 
to prioritize review of fixes for release blockers which might not be 
straightforward code regressions and therefore can't be fixed by a revert (for 
example, 3rd party services like readthedocs or github actions breaking our 
integrations or CI due to no change of our own).  However the RM has no special 
authority to circumvent processes, land code, prevent or code from landing, and 
should ideally never be making changes to the process in-line with the manual 
process of making the release.

The release manager really isn't supposed to do a bunch of development on 
trunk.  This has been confounded in the past by the fact that in the past Amber 
has been both a prolific contributor and the release manager, often at the same 
time; and, as a way to make the manual process easier, she often would do a 
bunch of automation work right before a relase.  But the role, inasmuch as it 
exists, ought to be about executing whatever manual steps remain after we've 
automated as much as we can.  Similarly, Amber's faithful execution of this 
role for years was a great benefit to everyone working on the project, but it 
also obscured that the RM is supposed to be a hat that one person wears for one 
to two weeks.  My personal feeling (it certainly would overstate this to call 
it a "process") is that if someone takes on the release manager role and can't 
get a prerelease out within a week, then they should step down and we should 
have a bigger community discussion about why we're stuck. 

My follow-up questions are:

Can we revert the change that introduced 10069 so that we can follow the 
process to re-introduce it after a release has been made, and have the 
conversation about which PR to accept and why post-release?
Do we need to revert the release process changes from 
https://github.com/twisted/twisted/pull/1464 
<https://github.com/twisted/twisted/pull/1464> for this release?  If so, what 
is the specific technical need this addresses?
Is there some problem with the release process changes that ought to make this 
revert more permanent? They look really good to me, but I don't want to 
discount that possibility either.  A reviewer can always request that code be 
pulled back out of trunk if there's a problem that was not forseen during the 
review process?

-g

[1]: I'm not sure if this has been propagated forward to our newer process docs 
in git; it would be good to delete most of this doc and replace it with a bunch 
of links so we don't have duplicate and outdated docs.

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

[Twisted-Python] resolving release management conflict

Reply via email to