On Sun, Jul 15, 2018 at 10:47:25PM +0000, Tristan Cacqueray wrote: > On July 14, 2018 3:10 pm, Paul Belanger wrote: > > On Sat, Jul 14, 2018 at 09:51:10AM +0000, Tristan Cacqueray wrote: > > > On July 13, 2018 9:16 pm, Paul Belanger wrote: > > > > On Fri, Jul 13, 2018 at 09:27:23PM +0200, Matthieu Huin wrote: > > > > > Hello, > > > > > > Thanks for starting this discussion, I believe it to be > > > important to the future > > > > relationship between SF and Zuul communities. > > > > > > I would like to poke the collective brainpower about the way > > > we ship > > > > > upstream components to which we also contribute, namely zuul and > > > > > nodepool, and see what we can do to improve if possible. > > > > > Thank you for starting this thread. > > > > > > > > As you probably all know, we choose to package these components as > > > > > RPMs, which allows us to ship zuul and nodepool with extra patches > > > > > that we call *tech previews*. These patches are always related to > > > > > features we are either contributing upstream ourselves or closely > > > > > follow, but that are not merged yet. The important point is also that > > > > > we have some relative confidence that these patches will eventually > > > > > make it into upstream. Unfortunately, "relative" and "eventually" must > > > > > be taken with a grain of salt: > > > > > > My main objection to this concept of 'Tech Preview' is this > > > follows more the > > > > development processes of OpenShift / Kubernetes and not OpenStack[1] > > > > (NOTE: zuul > > > > isn't actually an OpenStack project anymore, but governance likely will > > > > follow > > > > the 4 opens of OpenStack). We don't want to be releasing things a head > > > > of zuul, > > > > I point to the issues between the k8s and openshift projects. > > > > I disagree, please see: > > > http://crunchtools.com/is-openshift-a-fork-of-kubernetes-short-answer-no-longer-answer-heres-a-ton-of-technical-reasons/ > > > And: > > > http://www.softwarefactory-project.io/what-is-the-difference-between-software-factory-and-zuul.html > > > > > > > The biggest red flag for me, is downstream Red Hat does not do this > > > > process for > > > > OSP (Red Hat product for OpenStack). Teams are committed to push code > > > > into > > > > master first, then either make an release from it or in some cases > > > > backport into > > > > a downstream branch. I am 100% confident these teams had the same > > > > struggles we > > > > have highlighted before and would like to us to may query some folk > > > > about how > > > > they solved this issue. Because the latest version of OSP does not > > > > contain > > > > downstream features outside of upstream master branches. > > > > OSP has many core devs to push changes, SF only has Paul that is > > > core on Zuul. > > > > > FWIW: Being core really isn't different from non-core. While I agree there > > is > > relationships that are formed with other core members, which may give an > > advantage. Very rarely am I merging code at will. The difference between a > > +1 > > and +2 is very minimal to me. > > > > > > By having SoftwareFactory do the opposite of this, it may mean new > > > > features > > > > faster, but as you point out below, now means more technical debt and > > > > worst, > > > > meaning SF is only people on the hook to support this 'tech release' > > > > version. > > > > SF only does best effort support to fix bugs in zuul source code > > > as well as > > > in the code the team produces. > > > > > > > Give the size and workload of the current team, I assert we don't want > > > > to do > > > > this just for this reason. By shipping something different then > > > > upstream, I also > > > > don't believe upstream will be happy with this. In fact, I encourage us > > > > to talk > > > > the results of this discussion to the Zuul Discuss mailing list > > > > improving them > > > > of this topic. This would be a great opportunity to work closer > > > > together as a > > > > team, if we raised our concerns with them. > > > > To me, Zuul in SF isn't different then upstream. The > > > tech-preview features don't actually have to be part of the Zuul code. > > > > > > If doing tech-preview obstructs the SF relationship with the upstream Zuul > > > communities, then SF can drop the patches. I would prefer to keep the > > > features > > > integrated in Zuul so that it enables direct Zuul improvements, but if > > > this > > > is making upstream un-happy, then we can revisit SF integration. > > > > > > > > 1. We usually cannot tell how long it'll take for our patches to land > > > > > upstream. I don't have numbers to support it, but from memory It can > > > > > vary between days (some bug fixes made it quickly into master > > > > > upstream) to months (we've had OCI as tech preview for months, and > > > > > it's still not merged upstream). Fabien or Tristan can certainly > > > > > provide numbers on that point. > > > > > > I took some time to run the numbers using reviewday[2] a tool > > > we have in > > > > openstack-infra, for today I've compared nova[3] with zuul[4] (using > > > > official > > > > zuul projects[5]). As you can see, while does have a high openreview > > > > number, it > > > > is no where need nova. I also ran the numbers using tripleo[5] with > > > > zuul, and > > > > again zuul does come out a head of tripleo here. Give the large number > > > > of RedHat > > > > developers to tripleo, I think it is fair dive more into here why a > > > > 'tech > > > > preview' release of tripleo is not done. > > > > > I can understand how it feels like zuul takes a long time for > > > patches to land, > > > > but I believe the fix here is to double down on our commitment upstream > > > > and > > > > encourage more time for SF developers to be free to aid / assist with > > > > upstream > > > > zuul development. This includes code reviews of other non related SF > > > > patches. > > > > > > 2. We cannot guarantee that the feature in tech preview will > > > land > > > > > as-is. The upstream reviews are usually being discussed and can be > > > > > subject to change, meaning users should not consider any of the tech > > > > > preview features to be stable. > > > > > > +1000 > > > > > This extends with my comment above, and completely agree. This > > > is a very large > > > > concern as it means more work for only SF to deal with migration issues. > > > > It seems like we need to be even more clear about tech previews, > > > their migration are also best efforts. > > > > > > > > Looking at our distgit for Zuul, we currently ship 12 extra patches as > > > > > tech preview (5 of which about to be merged or merged - thus the spec > > > > > must be updated), and this is bound to increase if we keep > > > > > contributing things faster than they can be reviewed and accepted > > > > > upstream. It can become quite hard to maintain the patch chain as > > > > > upstream evolves. We also face the very real risk that one of our use > > > > > cases (and thus our upstream contributions) might contradict > > > > > upstream's roadmap, leading to rejected patches: do we become a fork > > > > > then ? Are we actually effectively a fork, providing a "Zuul that > > > > > could be someday" but definitely not the current Zuul? > > > > > > For me, the path forward here is staightforward, the patches > > > that we have in our > > > > zuul RPM file, we keep, there is no point removing them now, as it will > > > > be very > > > > disruptive. > > > > Actually, I don't think any patches are needed to pass sf-ci > > > integration test. > > > We would lose some useful features (like pipeline listing to generate > > > grafana dashboard), but no blockers that can't be worked around. > > > > > > > We stop working on any new features that SF requires and spend the > > > > effort working upstream to land these changes. This likely means, > > > > getting > > > > approval from management to be allowed to help push on these efforts > > > > upstream. > > > > Let me list the actual patches we are talking about here... > > > > > Thanks for doing this! Very helpful, if fact at one point tripleo wrote a > > tool > > to track patches in RPMs called debtor[6], I would love us in SF to run > > this so we > > can keep a better track. > > > > [6] https://github.com/redhat-cip/debtor > > > > > # 0001-executor-change-execution-log-to-INFO.patch > > > https://review.openstack.org/578704 > > > improvement, merged, took 13 days > > > > > > # 0001-gerrit-use-baseurl-for-change-uris-lookup.patch > > > https://review.openstack.org/579086 > > > bugfix, still under review (12 days) > > > > > +A > > > > > # 0001-Add-tenant-yaml-validation-option-to-scheduler.patch > > > https://review.openstack.org/574265 > > > feature, still under review (5 weeks) > > > > > +2, we can push to land Monday > > > > > # 0001-angular-call-enableProdMode.patch > > > https://review.openstack.org/573494 > > > bugfix, no clear solution, reported 5 weeks ago > > > > > Has -1 from zuul, linting error. Blocking reviews? > > > No blocking review, it's the javascript that is too complex... > > > > # 0001-model-fix-AttributeError-exception-in-freezeJobGraph.patch > > > https://review.openstack.org/579428 > > > bugfix, merged, took a day > > > > > > # 0001-gerrit-add-support-for-report-only-connection.patch > > > https://review.openstack.org/568216 > > > improvement (divide merger load by 2 for rdo), still under review (2 > > > months) > > > > > -1 from jeblair, but seems your reply hasn't been followed up. However, > > this one > > is surprise to me, I didn't realize it was a large performance boot. Could > > you > > update the commit message to include this and we can lump it under the > > recent > > discussions I had around 3pci with zuul. > > > Sure, I can do that. > > > > # 0001-executor-add-support-for-resource-connection-type.patch > > > https://review.openstack.org/570668 > > > improvement, minor modification to enable nodepool resources, > > > part of the container spec implementation > > > > > > # 0001-executor-add-log_stream_port-and-log_stream_file-set.patch > > > https://review.openstack.org/535538 > > > feature, waiting for zuul_stream support ssh port forward > > > > > Depends on logging rework, we likely need to allocate some people to help > > with > > it. > > > > > # 0001-zk-use-kazoo-retry-facilities.patch > > > https://review.openstack.org/536209 / https://review.openstack.org/535537 > > > feature/bugfix to enable zookeeper restart and reduce the amount of node > > > request, > > > reported 6 months ago, discussed here: > > > http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000047.html > > > > > Agree, I haven't really followed this one in a while, I'll spend a few hours > > this week getting back up to speed. However, maybe we can also spend some > > time > > reviewing the issues we seen in SF.io, as this isn't a problem with > > zuul.o.o. > > > I think tobiash is also having such issues, and I guess you can find many > ConnectionLost tracebacks in nodepool.o.o logs. > No, we don't have any of these tracebacks. Looking at the zookeeper debug logs in sf.io, I think we are having preformance issues. Is this server using boot from volume and backed with ceph? I'd actually like us to schedule moving zookeeper to a dedicated instances, backed with SSDs, and see if this helps. We also should discuss a zookeeper cluster while we are at it.
> > > # 0001-angular6-dashboard.patch / 0001-web-new-routes.patch > > > https://review.openstack.org/#/q/topic:zuul-ui-pages > > > https://review.openstack.org/#/q/topic:zuul-web-routes > > > feature, webui improvements > > > > > > > > > Then regarding nodepool: > > > > > > # 0001-config-add-statsd-server-config-parameter.patch > > > https://review.openstack.org/535560 > > > feature to remove sysconfig for statsd, proposed 7 months ago. > > > > > > # 0001-zk-skip-node-already-being-deleted-in-cleanup-leaked.patch > > > https://review.openstack.org/576288 > > > bugfix, proposed 1 month ago. > > > > > > # 0001-builder-do-not-configure-provider-that-doesn-t-manag.patch > > > https://review.openstack.org/578642 > > > bugfix, proposed 16 days ago. > > > > > These 3 patches above all have -1 from reviewers pending feedback from OP. > > TristanC, what do you think about maybe having somebody else on the team > > working > > on these? They look to be straightforward and a great chance for other SF > > members to maybe pick up some XP :) > > > The statsd config parameter reviewer asked for a significant rework that > makes such a small feature not worth it imo. > > I'll update the 2 others. > > > > # 0001-driver-runc.patch / 0001-driver-k8s.patch / > > > 0001-driver-openshift.patch > > > https://review.openstack.org/#/q/topic:nodepool-drivers > > > drivers, not actual nodepool modification, proposed 14 months ago. > > > > > This is by far the largest one I really have some concerns about, only > > because > > it is something we've agreed to do upstream in zuul. If we can identify > > somebody on SF to push on this upstream, I think now is the time to > > allocate the > > time to do it. > > > I can do that, actually I'm already waiting for the container spec to be > finalized. But as I said below, these drivers doesn't have to be part of > Zuul and they may be shipped through a sf-drivers package or something. > I mostly kept them in the Nodepool package as a way to demonstrate the > driver API is working as expected. > > > > > > > TL;DR; most are either cosmetic (e.g. statsd option in nodepool.yaml), > > > or either critical bug fixes that are already approved upstream. > > > > > > Then there are 2 "tech-preview": > > > > > > The first one is a better web-ui. At first, it was developped in > > > managesf to ease the migration from jenkins. Then the code was > > > contributed to Zuul and upstream merged the builds and jobs API. > > > Now SF added more pages to list the projects, the labels, the pipeline > > > conf. > > > etc... If needed, these new pages could be removed from Zuul and added > > > back to the legacy managesf interface. > > > > > > The second one are the nodepool drivers. With the new driver API, those > > > are just folders that do not modify the nodepool logic. > > > I keep the upstream review in sync, and I'm willing to do the legwork > > > to get them merged. But on the other hand, those drivers don't have to be > > > merged upstream. > > > > > +1 > > > > > > > At the same time, we agree the SF workflow process will stop included > > > > these > > > > patches in the RPM, and land everything upstream in master first. It > > > > isn't > > > > enough to just submit the patch to review.o.o, we must be getting it > > > > merged > > > > before including it into zuul RPM. > > > > +1 > > > > > > > > Yet the tech preview system has obvious advantages that make it > > > > > difficult to just drop this model, namely providing much needed > > > > > features that make Zuul and Nodepool much more serious and versatile > > > > > contenders in the world of CI, CD and code quality control - this is > > > > > also why we believe our changes will eventually make it upstream. > > > > > I hoped this tech preview system would benefit other Zuul > > > comunities, > > > but if this is causing troubles, then we can look into incubating the > > > features in managesf. > > > > > > > This is important feedback for the zuul project, and something I think > > > > we could > > > > discuss upstream on the zuul discuss ML. If we as a team believe this is > > > > important, then other zuul users upstream should too? > > > > SF-3.1 is currently waiting for upstream to tag master. > > > Then we could report the tech-preview as we did for SF-3.0: > > > http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000092.html > > > > > Next week I am going to create a new board in storyboard to help trace the > > 3.1.1 > > release of zuul. I believe that will help set some expectations when the > > next > > release happens. > > > > > > > I guess the question we need to answer is: why is it so hard or long > > > > > to have upstream land the features we propose? And what can we do to > > > > > change that? If we can improve on this, the need for patching will > > > > > decrease until we can ship code as close as possible to master, or > > > > > even tagged releases. > > > > > > > What are your thoughts on this? > > > > > > This is no limited to just zuul, I've seen this problem time > > > and time again for > > > > other opensource projects. Patience is required, and we need to adjust > > > > our > > > > expectations. If we want to influence change, we can and now is the > > > > time to do > > > > so. However, we need to spend the time and energy doing so. This may > > > > mean > > > > slowing down out feature development work to properly land what we have > > > > already > > > > done. > > > > I think we need to do more code reviews, how many do you think we > > > need > > > to do to gain influence? > > > > > Yes, I do think the key here is doing code reviews on non SF specific > > patches, > > additionally, even becoming more visible in IRC. I'm happy to help with > > paired > > programming, if that helps anybody. But maybe just asking, they I have x > > hours > > this week to work on something, what can I do? Would be most welcome I > > think. > > > Could we suggest Zuul to also do the review the OpenStack way? > https://docs.openstack.org/project-team-guide/review-the-openstack-way.html > Could you be more specific here, which parts do we need to review the openstack way. > Best Regard, > -Tristan > > > > > > > > [1] https://governance.openstack.org/tc/reference/opens.html > > > > [2] http://git.openstack.org/cgit/openstack-infra/reviewday > > > > [3] http://paste.openstack.org/raw/725858/ > > > > [4] http://paste.openstack.org/raw/725860/ > > > > [5] https://git.zuul-ci.org/cgit > > > > > _______________________________________________ > > > > Softwarefactory-dev mailing list > > > > [email protected] > > > > https://www.redhat.com/mailman/listinfo/softwarefactory-dev > > > > > > > > > > _______________________________________________ Softwarefactory-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/softwarefactory-dev
