On July 13, 2018 9:16 pm, Paul Belanger wrote:
On Fri, Jul 13, 2018 at 09:27:23PM +0200, Matthieu Huin wrote:Hello,Thanks for starting this discussion, I believe it to be important to the future relationship between SF and Zuul communities.I would like to poke the collective brainpower about the way we ship upstream components to which we also contribute, namely zuul and nodepool, and see what we can do to improve if possible.
Thank you for starting this thread.
As you probably all know, we choose to package these components as RPMs, which allows us to ship zuul and nodepool with extra patches that we call *tech previews*. These patches are always related to features we are either contributing upstream ourselves or closely follow, but that are not merged yet. The important point is also that we have some relative confidence that these patches will eventually make it into upstream. Unfortunately, "relative" and "eventually" must be taken with a grain of salt:My main objection to this concept of 'Tech Preview' is this follows more the development processes of OpenShift / Kubernetes and not OpenStack[1] (NOTE: zuul isn't actually an OpenStack project anymore, but governance likely will follow the 4 opens of OpenStack). We don't want to be releasing things a head of zuul, I point to the issues between the k8s and openshift projects.
I disagree, please see: http://crunchtools.com/is-openshift-a-fork-of-kubernetes-short-answer-no-longer-answer-heres-a-ton-of-technical-reasons/ And: http://www.softwarefactory-project.io/what-is-the-difference-between-software-factory-and-zuul.html
The biggest red flag for me, is downstream Red Hat does not do this process for OSP (Red Hat product for OpenStack). Teams are committed to push code into master first, then either make an release from it or in some cases backport into a downstream branch. I am 100% confident these teams had the same struggles we have highlighted before and would like to us to may query some folk about how they solved this issue. Because the latest version of OSP does not contain downstream features outside of upstream master branches.
OSP has many core devs to push changes, SF only has Paul that is core on Zuul.
By having SoftwareFactory do the opposite of this, it may mean new features faster, but as you point out below, now means more technical debt and worst, meaning SF is only people on the hook to support this 'tech release' version.
SF only does best effort support to fix bugs in zuul source code as well as in the code the team produces.
Give the size and workload of the current team, I assert we don't want to do this just for this reason. By shipping something different then upstream, I also don't believe upstream will be happy with this. In fact, I encourage us to talk the results of this discussion to the Zuul Discuss mailing list improving them of this topic. This would be a great opportunity to work closer together as a team, if we raised our concerns with them.
To me, Zuul in SF isn't different then upstream. The tech-preview features don't actually have to be part of the Zuul code. If doing tech-preview obstructs the SF relationship with the upstream Zuul communities, then SF can drop the patches. I would prefer to keep the features integrated in Zuul so that it enables direct Zuul improvements, but if this is making upstream un-happy, then we can revisit SF integration.
1. We usually cannot tell how long it'll take for our patches to land upstream. I don't have numbers to support it, but from memory It can vary between days (some bug fixes made it quickly into master upstream) to months (we've had OCI as tech preview for months, and it's still not merged upstream). Fabien or Tristan can certainly provide numbers on that point.I took some time to run the numbers using reviewday[2] a tool we have in openstack-infra, for today I've compared nova[3] with zuul[4] (using official zuul projects[5]). As you can see, while does have a high openreview number, it is no where need nova. I also ran the numbers using tripleo[5] with zuul, and again zuul does come out a head of tripleo here. Give the large number of RedHat developers to tripleo, I think it is fair dive more into here why a 'tech preview' release of tripleo is not done. I can understand how it feels like zuul takes a long time for patches to land, but I believe the fix here is to double down on our commitment upstream and encourage more time for SF developers to be free to aid / assist with upstream zuul development. This includes code reviews of other non related SF patches.2. We cannot guarantee that the feature in tech preview will land as-is. The upstream reviews are usually being discussed and can be subject to change, meaning users should not consider any of the tech preview features to be stable.+1000 This extends with my comment above, and completely agree. This is a very large concern as it means more work for only SF to deal with migration issues.
It seems like we need to be even more clear about tech previews, their migration are also best efforts.
Looking at our distgit for Zuul, we currently ship 12 extra patches as tech preview (5 of which about to be merged or merged - thus the spec must be updated), and this is bound to increase if we keep contributing things faster than they can be reviewed and accepted upstream. It can become quite hard to maintain the patch chain as upstream evolves. We also face the very real risk that one of our use cases (and thus our upstream contributions) might contradict upstream's roadmap, leading to rejected patches: do we become a fork then ? Are we actually effectively a fork, providing a "Zuul that could be someday" but definitely not the current Zuul?For me, the path forward here is staightforward, the patches that we have in our zuul RPM file, we keep, there is no point removing them now, as it will be very disruptive.
Actually, I don't think any patches are needed to pass sf-ci integration test. We would lose some useful features (like pipeline listing to generate grafana dashboard), but no blockers that can't be worked around.
We stop working on any new features that SF requires and spend the effort working upstream to land these changes. This likely means, getting approval from management to be allowed to help push on these efforts upstream.
Let me list the actual patches we are talking about here... # 0001-executor-change-execution-log-to-INFO.patch https://review.openstack.org/578704 improvement, merged, took 13 days # 0001-gerrit-use-baseurl-for-change-uris-lookup.patch https://review.openstack.org/579086 bugfix, still under review (12 days) # 0001-Add-tenant-yaml-validation-option-to-scheduler.patch https://review.openstack.org/574265 feature, still under review (5 weeks) # 0001-angular-call-enableProdMode.patch https://review.openstack.org/573494 bugfix, no clear solution, reported 5 weeks ago # 0001-model-fix-AttributeError-exception-in-freezeJobGraph.patch https://review.openstack.org/579428 bugfix, merged, took a day # 0001-gerrit-add-support-for-report-only-connection.patch https://review.openstack.org/568216 improvement (divide merger load by 2 for rdo), still under review (2 months) # 0001-executor-add-support-for-resource-connection-type.patch https://review.openstack.org/570668 improvement, minor modification to enable nodepool resources, part of the container spec implementation # 0001-executor-add-log_stream_port-and-log_stream_file-set.patch https://review.openstack.org/535538 feature, waiting for zuul_stream support ssh port forward # 0001-zk-use-kazoo-retry-facilities.patch https://review.openstack.org/536209 / https://review.openstack.org/535537 feature/bugfix to enable zookeeper restart and reduce the amount of node request, reported 6 months ago, discussed here: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000047.html # 0001-angular6-dashboard.patch / 0001-web-new-routes.patch https://review.openstack.org/#/q/topic:zuul-ui-pages https://review.openstack.org/#/q/topic:zuul-web-routes feature, webui improvements Then regarding nodepool: # 0001-config-add-statsd-server-config-parameter.patch https://review.openstack.org/535560 feature to remove sysconfig for statsd, proposed 7 months ago. # 0001-zk-skip-node-already-being-deleted-in-cleanup-leaked.patch https://review.openstack.org/576288 bugfix, proposed 1 month ago. # 0001-builder-do-not-configure-provider-that-doesn-t-manag.patch https://review.openstack.org/578642 bugfix, proposed 16 days ago. # 0001-driver-runc.patch / 0001-driver-k8s.patch / 0001-driver-openshift.patch https://review.openstack.org/#/q/topic:nodepool-drivers drivers, not actual nodepool modification, proposed 14 months ago. TL;DR; most are either cosmetic (e.g. statsd option in nodepool.yaml), or either critical bug fixes that are already approved upstream. Then there are 2 "tech-preview": The first one is a better web-ui. At first, it was developped in managesf to ease the migration from jenkins. Then the code was contributed to Zuul and upstream merged the builds and jobs API. Now SF added more pages to list the projects, the labels, the pipeline conf. etc... If needed, these new pages could be removed from Zuul and added back to the legacy managesf interface. The second one are the nodepool drivers. With the new driver API, those are just folders that do not modify the nodepool logic. I keep the upstream review in sync, and I'm willing to do the legwork to get them merged. But on the other hand, those drivers don't have to be merged upstream.
At the same time, we agree the SF workflow process will stop included these patches in the RPM, and land everything upstream in master first. It isn't enough to just submit the patch to review.o.o, we must be getting it merged before including it into zuul RPM.
+1
Yet the tech preview system has obvious advantages that make it difficult to just drop this model, namely providing much needed features that make Zuul and Nodepool much more serious and versatile contenders in the world of CI, CD and code quality control - this is also why we believe our changes will eventually make it upstream.
I hoped this tech preview system would benefit other Zuul comunities, but if this is causing troubles, then we can look into incubating the features in managesf.
This is important feedback for the zuul project, and something I think we could discuss upstream on the zuul discuss ML. If we as a team believe this is important, then other zuul users upstream should too?
SF-3.1 is currently waiting for upstream to tag master. Then we could report the tech-preview as we did for SF-3.0: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000092.html
I guess the question we need to answer is: why is it so hard or long to have upstream land the features we propose? And what can we do to change that? If we can improve on this, the need for patching will decrease until we can ship code as close as possible to master, or even tagged releases. What are your thoughts on this?This is no limited to just zuul, I've seen this problem time and time again for other opensource projects. Patience is required, and we need to adjust our expectations. If we want to influence change, we can and now is the time to do so. However, we need to spend the time and energy doing so. This may mean slowing down out feature development work to properly land what we have already done.
I think we need to do more code reviews, how many do you think we need to do to gain influence? Thanks, -Tristan
[1] https://governance.openstack.org/tc/reference/opens.html [2] http://git.openstack.org/cgit/openstack-infra/reviewday [3] http://paste.openstack.org/raw/725858/ [4] http://paste.openstack.org/raw/725860/ [5] https://git.zuul-ci.org/cgit _______________________________________________ Softwarefactory-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/softwarefactory-dev
pgpeYk1veUWhr.pgp
Description: PGP signature
_______________________________________________ Softwarefactory-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/softwarefactory-dev
