Re: [Softwarefactory-dev] Tech previews and upstream (zuul, nodepool)

Tristan Cacqueray Sun, 15 Jul 2018 15:47:24 -0700

On July 14, 2018 3:10 pm, Paul Belanger wrote:

On Sat, Jul 14, 2018 at 09:51:10AM +0000, Tristan Cacqueray wrote:

On July 13, 2018 9:16 pm, Paul Belanger wrote:
> On Fri, Jul 13, 2018 at 09:27:23PM +0200, Matthieu Huin wrote:
> > Hello,

> > > Thanks for starting this discussion, I believe it to be important to the future

> relationship between SF and Zuul communities.

> > > I would like to poke the collective brainpower about the way we ship

> > upstream components to which we also contribute, namely zuul and
> > nodepool, and see what we can do to improve if possible.

> > Thank you for starting this thread.


> > As you probably all know, we choose to package these components as
> > RPMs, which allows us to ship zuul and nodepool with extra patches
> > that we call *tech previews*. These patches are always related to
> > features we are either contributing upstream ourselves or closely
> > follow, but that are not merged yet. The important point is also that
> > we have some relative confidence that these patches will eventually
> > make it into upstream. Unfortunately, "relative" and "eventually" must
> > be taken with a grain of salt:

> > > My main objection to this concept of 'Tech Preview' is this follows more the

> development processes of OpenShift / Kubernetes and not OpenStack[1] (NOTE: 
zuul
> isn't actually an OpenStack project anymore, but governance likely will follow
> the 4 opens of OpenStack). We don't want to be releasing things a head of 
zuul,
> I point to the issues between the k8s and openshift projects.

> I disagree, please see:

http://crunchtools.com/is-openshift-a-fork-of-kubernetes-short-answer-no-longer-answer-heres-a-ton-of-technical-reasons/
And:
http://www.softwarefactory-project.io/what-is-the-difference-between-software-factory-and-zuul.html

> The biggest red flag for me, is downstream Red Hat does not do this process 
for
> OSP (Red Hat product for OpenStack). Teams are committed to push code into
> master first, then either make an release from it or in some cases backport 
into
> a downstream branch. I am 100% confident these teams had the same struggles we
> have highlighted before and would like to us to may query some folk about how
> they solved this issue.  Because the latest version of OSP does not contain
> downstream features outside of upstream master branches.

> OSP has many core devs to push changes, SF only has Paul that is core on Zuul.

FWIW: Being core really isn't different from non-core. While I agree there is
relationships that are formed with other core members, which may give an
advantage. Very rarely am I merging code at will. The difference between a +1
and +2 is very minimal to me.

> By having SoftwareFactory do the opposite of this, it may mean new features
> faster, but as you point out below, now means more technical debt and worst,
> meaning SF is only people on the hook to support this 'tech release' version.

> SF only does best effort support to fix bugs in zuul source code as well as

in the code the team produces.

> Give the size and workload of the current team, I assert we don't want to do
> this just for this reason. By shipping something different then upstream, I 
also
> don't believe upstream will be happy with this. In fact, I encourage us to 
talk
> the results of this discussion to the Zuul Discuss mailing list improving them
> of this topic. This would be a great opportunity to work closer together as a
> team, if we raised our concerns with them.

> To me, Zuul in SF isn't different then upstream. The

tech-preview features don't actually have to be part of the Zuul code.

If doing tech-preview obstructs the SF relationship with the upstream Zuul
communities, then SF can drop the patches. I would prefer to keep the features
integrated in Zuul so that it enables direct Zuul improvements, but if this
is making upstream un-happy, then we can revisit SF integration.

> > 1. We usually cannot tell how long it'll take for our patches to land
> > upstream. I don't have numbers to support it, but from memory It can
> > vary between days (some bug fixes made it quickly into master
> > upstream) to months (we've had OCI as tech preview for months, and
> > it's still not merged upstream). Fabien or Tristan can certainly
> > provide numbers on that point.

> > > I took some time to run the numbers using reviewday[2] a tool we have in

> openstack-infra, for today I've compared nova[3] with zuul[4] (using official
> zuul projects[5]). As you can see, while does have a high openreview number, 
it
> is no where need nova. I also ran the numbers using tripleo[5] with zuul, and
> again zuul does come out a head of tripleo here. Give the large number of 
RedHat
> developers to tripleo, I think it is fair dive more into here why a 'tech
> preview' release of tripleo is not done.

> > I can understand how it feels like zuul takes a long time for patches to land,

> but I believe the fix here is to double down on our commitment upstream and
> encourage more time for SF developers to be free to aid / assist with upstream
> zuul development. This includes code reviews of other non related SF patches.

> > > 2. We cannot guarantee that the feature in tech preview will land

> > as-is. The upstream reviews are usually being discussed and can be
> > subject to change, meaning users should not consider any of the tech
> > preview features to be stable.

> > > +1000 > > This extends with my comment above, and completely agree. This is a very large

> concern as it means more work for only SF to deal with migration issues.

> It seems like we need to be even more clear about tech previews,

their migration are also best efforts.

> > Looking at our distgit for Zuul, we currently ship 12 extra patches as
> > tech preview (5 of which about to be merged or merged - thus the spec
> > must be updated), and this is bound to increase if we keep
> > contributing things faster than they can be reviewed and accepted
> > upstream. It can become quite hard to maintain the patch chain as
> > upstream evolves. We also face the very real risk that one of our use
> > cases (and thus our upstream contributions) might contradict
> > upstream's roadmap, leading to rejected patches: do we become a fork
> > then ? Are we actually effectively a fork, providing a "Zuul that
> > could be someday" but definitely not the current Zuul?

> > > For me, the path forward here is staightforward, the patches that we have in our

> zuul RPM file, we keep, there is no point removing them now, as it will be 
very
> disruptive.

> Actually, I don't think any patches are needed to pass sf-ci integration test.

We would lose some useful features (like pipeline listing to generate
grafana dashboard), but no blockers that can't be worked around.

> We stop working on any new features that SF requires and spend the
> effort working upstream to land these changes. This likely means, getting
> approval from management to be allowed to help push on these efforts upstream.

> Let me list the actual patches we are talking about here...

Thanks for doing this! Very helpful, if fact at one point tripleo wrote a tool
to track patches in RPMs called debtor[6], I would love us in SF to run this so 
we
can keep a better track.

[6] https://github.com/redhat-cip/debtor

# 0001-executor-change-execution-log-to-INFO.patch
 https://review.openstack.org/578704
improvement, merged, took 13 days

# 0001-gerrit-use-baseurl-for-change-uris-lookup.patch
 https://review.openstack.org/579086
bugfix, still under review (12 days)

+A

# 0001-Add-tenant-yaml-validation-option-to-scheduler.patch
 https://review.openstack.org/574265
feature, still under review (5 weeks)

+2, we can push to land Monday

# 0001-angular-call-enableProdMode.patch
 https://review.openstack.org/573494
bugfix, no clear solution, reported 5 weeks ago

Has -1 from zuul, linting error. Blocking reviews?

No blocking review, it's the javascript that is too complex...

# 0001-model-fix-AttributeError-exception-in-freezeJobGraph.patch
 https://review.openstack.org/579428
bugfix, merged, took a day

# 0001-gerrit-add-support-for-report-only-connection.patch
 https://review.openstack.org/568216
improvement (divide merger load by 2 for rdo), still under review (2 months)

-1 from jeblair, but seems your reply hasn't been followed up. However, this one
is surprise to me, I didn't realize it was a large performance boot. Could you
update the commit message to include this and we can lump it under the recent
discussions I had around 3pci with zuul.

Sure, I can do that.

# 0001-executor-add-support-for-resource-connection-type.patch
 https://review.openstack.org/570668
improvement, minor modification to enable nodepool resources,
part of the container spec implementation

# 0001-executor-add-log_stream_port-and-log_stream_file-set.patch
 https://review.openstack.org/535538
feature, waiting for zuul_stream support ssh port forward

Depends on logging rework, we likely need to allocate some people to help with
it.

# 0001-zk-use-kazoo-retry-facilities.patch
 https://review.openstack.org/536209 / https://review.openstack.org/535537
feature/bugfix to enable zookeeper restart and reduce the amount of node 
request,
reported 6 months ago, discussed here:
http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000047.html

Agree, I haven't really followed this one in a while, I'll spend a few hours
this week getting back up to speed.  However, maybe we can also spend some time
reviewing the issues we seen in SF.io, as this isn't a problem with zuul.o.o.

I think tobiash is also having such issues, and I guess you can find many
ConnectionLost tracebacks in nodepool.o.o logs.

# 0001-angular6-dashboard.patch / 0001-web-new-routes.patch
 https://review.openstack.org/#/q/topic:zuul-ui-pages
 https://review.openstack.org/#/q/topic:zuul-web-routes
feature, webui improvements


Then regarding nodepool:

# 0001-config-add-statsd-server-config-parameter.patch
 https://review.openstack.org/535560
feature to remove sysconfig for statsd, proposed 7 months ago.

# 0001-zk-skip-node-already-being-deleted-in-cleanup-leaked.patch
 https://review.openstack.org/576288
bugfix, proposed 1 month ago.

# 0001-builder-do-not-configure-provider-that-doesn-t-manag.patch
 https://review.openstack.org/578642
bugfix, proposed 16 days ago.

These 3 patches above all have -1 from reviewers pending feedback from OP.
TristanC, what do you think about maybe having somebody else on the team working
on these?  They look to be straightforward and a great chance for other SF
members to maybe pick up some XP :)

The statsd config parameter reviewer asked for a significant rework that
makes such a small feature not worth it imo.

I'll update the 2 others.

# 0001-driver-runc.patch / 0001-driver-k8s.patch / 0001-driver-openshift.patch
 https://review.openstack.org/#/q/topic:nodepool-drivers
drivers, not actual nodepool modification, proposed 14 months ago.

This is by far the largest one I really have some concerns about, only because
it is something we've agreed to do upstream in zuul.  If we can identify
somebody on SF to push on this upstream, I think now is the time to allocate the
time to do it.

I can do that, actually I'm already waiting for the container spec to be
finalized. But as I said below, these drivers doesn't have to be part of
Zuul and they may be shipped through a sf-drivers package or something.
I mostly kept them in the Nodepool package as a way to demonstrate the
driver API is working as expected.


TL;DR; most are either cosmetic (e.g. statsd option in nodepool.yaml),
or either critical bug fixes that are already approved upstream.

Then there are 2 "tech-preview":

The first one is a better web-ui. At first, it was developped in
managesf to ease the migration from jenkins. Then the code was
contributed to Zuul and upstream merged the builds and jobs API.
Now SF added more pages to list the projects, the labels, the pipeline conf.
etc... If needed, these new pages could be removed from Zuul and added
back to the legacy managesf interface.

The second one are the nodepool drivers. With the new driver API, those
are just folders that do not modify the nodepool logic.
I keep the upstream review in sync, and I'm willing to do the legwork
to get them merged. But on the other hand, those drivers don't have to be
merged upstream.

+1


> At the same time, we agree the SF workflow process will stop included these
> patches in the RPM, and land everything upstream in master first. It isn't
> enough to just submit the patch to review.o.o, we must be getting it merged
> before including it into zuul RPM.

> +1


> > Yet the tech preview system has obvious advantages that make it
> > difficult to just drop this model, namely providing much needed
> > features that make Zuul and Nodepool much more serious and versatile
> > contenders in the world of CI, CD and code quality control - this is
> > also why we believe our changes will eventually make it upstream.

> > I hoped this tech preview system would benefit other Zuul comunities,

but if this is causing troubles, then we can look into incubating the
features in managesf.

> This is important feedback for the zuul project, and something I think we 
could
> discuss upstream on the zuul discuss ML. If we as a team believe this is
> important, then other zuul users upstream should too?

> SF-3.1 is currently waiting for upstream to tag master.

Then we could report the tech-preview as we did for SF-3.0:
http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000092.html

Next week I am going to create a new board in storyboard to help trace the 3.1.1
release of zuul. I believe that will help set some expectations when the next
release happens.

> > I guess the question we need to answer is: why is it so hard or long
> > to have upstream land the features we propose? And what can we do to
> > change that? If we can improve on this, the need for patching will
> > decrease until we can ship code as close as possible to master, or
> > even tagged releases.

> > > > What are your thoughts on this? > > > This is no limited to just zuul, I've seen this problem time and time again for

> other opensource projects. Patience is required, and we need to adjust our
> expectations. If we want to influence change, we can and now is the time to do
> so. However, we need to spend the time and energy doing so. This may mean
> slowing down out feature development work to properly land what we have 
already
> done.

> I think we need to do more code reviews, how many do you think we need

to do to gain influence?

Yes, I do think the key here is doing code reviews on non SF specific patches,
additionally, even becoming more visible in IRC. I'm happy to help with paired
programming, if that helps anybody. But maybe just asking, they I have x hours
this week to work on something, what can I do? Would be most welcome I think.

Could we suggest Zuul to also do the review the OpenStack way?
 https://docs.openstack.org/project-team-guide/review-the-openstack-way.html

Best Regard,
-Tristan


> [1] https://governance.openstack.org/tc/reference/opens.html
> [2] http://git.openstack.org/cgit/openstack-infra/reviewday
> [3] http://paste.openstack.org/raw/725858/
> [4] http://paste.openstack.org/raw/725860/
> [5] https://git.zuul-ci.org/cgit

> > _______________________________________________

> Softwarefactory-dev mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/softwarefactory-dev

pgp_kNKT9T4IP.pgp
Description: PGP signature

_______________________________________________
Softwarefactory-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/softwarefactory-dev

Re: [Softwarefactory-dev] Tech previews and upstream (zuul, nodepool)

Reply via email to