Re: [Softwarefactory-dev] Tech previews and upstream (zuul, nodepool)

Paul Belanger Mon, 16 Jul 2018 07:01:44 -0700

On Sun, Jul 15, 2018 at 10:47:25PM +0000, Tristan Cacqueray wrote:
> On July 14, 2018 3:10 pm, Paul Belanger wrote:
> > On Sat, Jul 14, 2018 at 09:51:10AM +0000, Tristan Cacqueray wrote:
> > > On July 13, 2018 9:16 pm, Paul Belanger wrote:
> > > > On Fri, Jul 13, 2018 at 09:27:23PM +0200, Matthieu Huin wrote:
> > > > > Hello,
> > > > > > Thanks for starting this discussion, I believe it to be
> > > important to the future
> > > > relationship between SF and Zuul communities.
> > > > > > I would like to poke the collective brainpower about the way
> > > we ship
> > > > > upstream components to which we also contribute, namely zuul and
> > > > > nodepool, and see what we can do to improve if possible.
> > > > > Thank you for starting this thread.
> > > 
> > > > > As you probably all know, we choose to package these components as
> > > > > RPMs, which allows us to ship zuul and nodepool with extra patches
> > > > > that we call *tech previews*. These patches are always related to
> > > > > features we are either contributing upstream ourselves or closely
> > > > > follow, but that are not merged yet. The important point is also that
> > > > > we have some relative confidence that these patches will eventually
> > > > > make it into upstream. Unfortunately, "relative" and "eventually" must
> > > > > be taken with a grain of salt:
> > > > > > My main objection to this concept of 'Tech Preview' is this
> > > follows more the
> > > > development processes of OpenShift / Kubernetes and not OpenStack[1] 
> > > > (NOTE: zuul
> > > > isn't actually an OpenStack project anymore, but governance likely will 
> > > > follow
> > > > the 4 opens of OpenStack). We don't want to be releasing things a head 
> > > > of zuul,
> > > > I point to the issues between the k8s and openshift projects.
> > > > I disagree, please see:
> > > http://crunchtools.com/is-openshift-a-fork-of-kubernetes-short-answer-no-longer-answer-heres-a-ton-of-technical-reasons/
> > > And:
> > > http://www.softwarefactory-project.io/what-is-the-difference-between-software-factory-and-zuul.html
> > > 
> > > > The biggest red flag for me, is downstream Red Hat does not do this 
> > > > process for
> > > > OSP (Red Hat product for OpenStack). Teams are committed to push code 
> > > > into
> > > > master first, then either make an release from it or in some cases 
> > > > backport into
> > > > a downstream branch. I am 100% confident these teams had the same 
> > > > struggles we
> > > > have highlighted before and would like to us to may query some folk 
> > > > about how
> > > > they solved this issue.  Because the latest version of OSP does not 
> > > > contain
> > > > downstream features outside of upstream master branches.
> > > > OSP has many core devs to push changes, SF only has Paul that is
> > > core on Zuul.
> > > 
> > FWIW: Being core really isn't different from non-core. While I agree there 
> > is
> > relationships that are formed with other core members, which may give an
> > advantage. Very rarely am I merging code at will. The difference between a 
> > +1
> > and +2 is very minimal to me.
> > 
> > > > By having SoftwareFactory do the opposite of this, it may mean new 
> > > > features
> > > > faster, but as you point out below, now means more technical debt and 
> > > > worst,
> > > > meaning SF is only people on the hook to support this 'tech release' 
> > > > version.
> > > > SF only does best effort support to fix bugs in zuul source code
> > > as well as
> > > in the code the team produces.
> > > 
> > > > Give the size and workload of the current team, I assert we don't want 
> > > > to do
> > > > this just for this reason. By shipping something different then 
> > > > upstream, I also
> > > > don't believe upstream will be happy with this. In fact, I encourage us 
> > > > to talk
> > > > the results of this discussion to the Zuul Discuss mailing list 
> > > > improving them
> > > > of this topic. This would be a great opportunity to work closer 
> > > > together as a
> > > > team, if we raised our concerns with them.
> > > > To me, Zuul in SF isn't different then upstream. The
> > > tech-preview features don't actually have to be part of the Zuul code.
> > > 
> > > If doing tech-preview obstructs the SF relationship with the upstream Zuul
> > > communities, then SF can drop the patches. I would prefer to keep the 
> > > features
> > > integrated in Zuul so that it enables direct Zuul improvements, but if 
> > > this
> > > is making upstream un-happy, then we can revisit SF integration.
> > > 
> > > > > 1. We usually cannot tell how long it'll take for our patches to land
> > > > > upstream. I don't have numbers to support it, but from memory It can
> > > > > vary between days (some bug fixes made it quickly into master
> > > > > upstream) to months (we've had OCI as tech preview for months, and
> > > > > it's still not merged upstream). Fabien or Tristan can certainly
> > > > > provide numbers on that point.
> > > > > > I took some time to run the numbers using reviewday[2] a tool
> > > we have in
> > > > openstack-infra, for today I've compared nova[3] with zuul[4] (using 
> > > > official
> > > > zuul projects[5]). As you can see, while does have a high openreview 
> > > > number, it
> > > > is no where need nova. I also ran the numbers using tripleo[5] with 
> > > > zuul, and
> > > > again zuul does come out a head of tripleo here. Give the large number 
> > > > of RedHat
> > > > developers to tripleo, I think it is fair dive more into here why a 
> > > > 'tech
> > > > preview' release of tripleo is not done.
> > > > > I can understand how it feels like zuul takes a long time for
> > > patches to land,
> > > > but I believe the fix here is to double down on our commitment upstream 
> > > > and
> > > > encourage more time for SF developers to be free to aid / assist with 
> > > > upstream
> > > > zuul development. This includes code reviews of other non related SF 
> > > > patches.
> > > > > > 2. We cannot guarantee that the feature in tech preview will
> > > land
> > > > > as-is. The upstream reviews are usually being discussed and can be
> > > > > subject to change, meaning users should not consider any of the tech
> > > > > preview features to be stable.
> > > > > > +1000
> > > > > This extends with my comment above, and completely agree. This
> > > is a very large
> > > > concern as it means more work for only SF to deal with migration issues.
> > > > It seems like we need to be even more clear about tech previews,
> > > their migration are also best efforts.
> > > 
> > > > > Looking at our distgit for Zuul, we currently ship 12 extra patches as
> > > > > tech preview (5 of which about to be merged or merged - thus the spec
> > > > > must be updated), and this is bound to increase if we keep
> > > > > contributing things faster than they can be reviewed and accepted
> > > > > upstream. It can become quite hard to maintain the patch chain as
> > > > > upstream evolves. We also face the very real risk that one of our use
> > > > > cases (and thus our upstream contributions) might contradict
> > > > > upstream's roadmap, leading to rejected patches: do we become a fork
> > > > > then ? Are we actually effectively a fork, providing a "Zuul that
> > > > > could be someday" but definitely not the current Zuul?
> > > > > > For me, the path forward here is staightforward, the patches
> > > that we have in our
> > > > zuul RPM file, we keep, there is no point removing them now, as it will 
> > > > be very
> > > > disruptive.
> > > > Actually, I don't think any patches are needed to pass sf-ci
> > > integration test.
> > > We would lose some useful features (like pipeline listing to generate
> > > grafana dashboard), but no blockers that can't be worked around.
> > > 
> > > > We stop working on any new features that SF requires and spend the
> > > > effort working upstream to land these changes. This likely means, 
> > > > getting
> > > > approval from management to be allowed to help push on these efforts 
> > > > upstream.
> > > > Let me list the actual patches we are talking about here...
> > > 
> > Thanks for doing this! Very helpful, if fact at one point tripleo wrote a 
> > tool
> > to track patches in RPMs called debtor[6], I would love us in SF to run 
> > this so we
> > can keep a better track.
> > 
> > [6] https://github.com/redhat-cip/debtor
> > 
> > > # 0001-executor-change-execution-log-to-INFO.patch
> > >  https://review.openstack.org/578704
> > > improvement, merged, took 13 days
> > > 
> > > # 0001-gerrit-use-baseurl-for-change-uris-lookup.patch
> > >  https://review.openstack.org/579086
> > > bugfix, still under review (12 days)
> > > 
> > +A
> > 
> > > # 0001-Add-tenant-yaml-validation-option-to-scheduler.patch
> > >  https://review.openstack.org/574265
> > > feature, still under review (5 weeks)
> > > 
> > +2, we can push to land Monday
> > 
> > > # 0001-angular-call-enableProdMode.patch
> > >  https://review.openstack.org/573494
> > > bugfix, no clear solution, reported 5 weeks ago
> > > 
> > Has -1 from zuul, linting error. Blocking reviews?
> > 
> No blocking review, it's the javascript that is too complex...
> 
> > > # 0001-model-fix-AttributeError-exception-in-freezeJobGraph.patch
> > >  https://review.openstack.org/579428
> > > bugfix, merged, took a day
> > > 
> > > # 0001-gerrit-add-support-for-report-only-connection.patch
> > >  https://review.openstack.org/568216
> > > improvement (divide merger load by 2 for rdo), still under review (2 
> > > months)
> > > 
> > -1 from jeblair, but seems your reply hasn't been followed up. However, 
> > this one
> > is surprise to me, I didn't realize it was a large performance boot. Could 
> > you
> > update the commit message to include this and we can lump it under the 
> > recent
> > discussions I had around 3pci with zuul.
> > 
> Sure, I can do that.
> 
> > > # 0001-executor-add-support-for-resource-connection-type.patch
> > >  https://review.openstack.org/570668
> > > improvement, minor modification to enable nodepool resources,
> > > part of the container spec implementation
> > > 
> > > # 0001-executor-add-log_stream_port-and-log_stream_file-set.patch
> > >  https://review.openstack.org/535538
> > > feature, waiting for zuul_stream support ssh port forward
> > > 
> > Depends on logging rework, we likely need to allocate some people to help 
> > with
> > it.
> > 
> > > # 0001-zk-use-kazoo-retry-facilities.patch
> > >  https://review.openstack.org/536209 / https://review.openstack.org/535537
> > > feature/bugfix to enable zookeeper restart and reduce the amount of node 
> > > request,
> > > reported 6 months ago, discussed here:
> > > http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000047.html
> > > 
> > Agree, I haven't really followed this one in a while, I'll spend a few hours
> > this week getting back up to speed.  However, maybe we can also spend some 
> > time
> > reviewing the issues we seen in SF.io, as this isn't a problem with 
> > zuul.o.o.
> > 
> I think tobiash is also having such issues, and I guess you can find many
> ConnectionLost tracebacks in nodepool.o.o logs.
> 
No, we don't have any of these tracebacks.  Looking at the zookeeper debug logs
in sf.io, I think we are having preformance issues. Is this server using boot
from volume and backed with ceph? I'd actually like us to schedule moving
zookeeper to a dedicated instances, backed with SSDs, and see if this helps.  We
also should discuss a zookeeper cluster while we are at it.


> > > # 0001-angular6-dashboard.patch / 0001-web-new-routes.patch
> > >  https://review.openstack.org/#/q/topic:zuul-ui-pages
> > >  https://review.openstack.org/#/q/topic:zuul-web-routes
> > > feature, webui improvements
> > > 
> > > 
> > > Then regarding nodepool:
> > > 
> > > # 0001-config-add-statsd-server-config-parameter.patch
> > >  https://review.openstack.org/535560
> > > feature to remove sysconfig for statsd, proposed 7 months ago.
> > > 
> > > # 0001-zk-skip-node-already-being-deleted-in-cleanup-leaked.patch
> > >  https://review.openstack.org/576288
> > > bugfix, proposed 1 month ago.
> > > 
> > > # 0001-builder-do-not-configure-provider-that-doesn-t-manag.patch
> > >  https://review.openstack.org/578642
> > > bugfix, proposed 16 days ago.
> > > 
> > These 3 patches above all have -1 from reviewers pending feedback from OP.
> > TristanC, what do you think about maybe having somebody else on the team 
> > working
> > on these?  They look to be straightforward and a great chance for other SF
> > members to maybe pick up some XP :)
> > 
> The statsd config parameter reviewer asked for a significant rework that
> makes such a small feature not worth it imo.
> 
> I'll update the 2 others.
> 
> > > # 0001-driver-runc.patch / 0001-driver-k8s.patch / 
> > > 0001-driver-openshift.patch
> > >  https://review.openstack.org/#/q/topic:nodepool-drivers
> > > drivers, not actual nodepool modification, proposed 14 months ago.
> > > 
> > This is by far the largest one I really have some concerns about, only 
> > because
> > it is something we've agreed to do upstream in zuul.  If we can identify
> > somebody on SF to push on this upstream, I think now is the time to 
> > allocate the
> > time to do it.
> > 
> I can do that, actually I'm already waiting for the container spec to be
> finalized. But as I said below, these drivers doesn't have to be part of
> Zuul and they may be shipped through a sf-drivers package or something.
> I mostly kept them in the Nodepool package as a way to demonstrate the
> driver API is working as expected.
> 
> > > 
> > > TL;DR; most are either cosmetic (e.g. statsd option in nodepool.yaml),
> > > or either critical bug fixes that are already approved upstream.
> > > 
> > > Then there are 2 "tech-preview":
> > > 
> > > The first one is a better web-ui. At first, it was developped in
> > > managesf to ease the migration from jenkins. Then the code was
> > > contributed to Zuul and upstream merged the builds and jobs API.
> > > Now SF added more pages to list the projects, the labels, the pipeline 
> > > conf.
> > > etc... If needed, these new pages could be removed from Zuul and added
> > > back to the legacy managesf interface.
> > > 
> > > The second one are the nodepool drivers. With the new driver API, those
> > > are just folders that do not modify the nodepool logic.
> > > I keep the upstream review in sync, and I'm willing to do the legwork
> > > to get them merged. But on the other hand, those drivers don't have to be
> > > merged upstream.
> > > 
> > +1
> > > 
> > > > At the same time, we agree the SF workflow process will stop included 
> > > > these
> > > > patches in the RPM, and land everything upstream in master first. It 
> > > > isn't
> > > > enough to just submit the patch to review.o.o, we must be getting it 
> > > > merged
> > > > before including it into zuul RPM.
> > > > +1
> > > 
> > > > > Yet the tech preview system has obvious advantages that make it
> > > > > difficult to just drop this model, namely providing much needed
> > > > > features that make Zuul and Nodepool much more serious and versatile
> > > > > contenders in the world of CI, CD and code quality control - this is
> > > > > also why we believe our changes will eventually make it upstream.
> > > > > I hoped this tech preview system would benefit other Zuul
> > > comunities,
> > > but if this is causing troubles, then we can look into incubating the
> > > features in managesf.
> > > 
> > > > This is important feedback for the zuul project, and something I think 
> > > > we could
> > > > discuss upstream on the zuul discuss ML. If we as a team believe this is
> > > > important, then other zuul users upstream should too?
> > > > SF-3.1 is currently waiting for upstream to tag master.
> > > Then we could report the tech-preview as we did for SF-3.0:
> > > http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-March/000092.html
> > > 
> > Next week I am going to create a new board in storyboard to help trace the 
> > 3.1.1
> > release of zuul. I believe that will help set some expectations when the 
> > next
> > release happens.
> > 
> > > > > I guess the question we need to answer is: why is it so hard or long
> > > > > to have upstream land the features we propose? And what can we do to
> > > > > change that? If we can improve on this, the need for patching will
> > > > > decrease until we can ship code as close as possible to master, or
> > > > > even tagged releases.
> > > > > > > What are your thoughts on this?
> > > > > > This is no limited to just zuul, I've seen this problem time
> > > and time again for
> > > > other opensource projects. Patience is required, and we need to adjust 
> > > > our
> > > > expectations. If we want to influence change, we can and now is the 
> > > > time to do
> > > > so. However, we need to spend the time and energy doing so. This may 
> > > > mean
> > > > slowing down out feature development work to properly land what we have 
> > > > already
> > > > done.
> > > > I think we need to do more code reviews, how many do you think we
> > > need
> > > to do to gain influence?
> > > 
> > Yes, I do think the key here is doing code reviews on non SF specific 
> > patches,
> > additionally, even becoming more visible in IRC. I'm happy to help with 
> > paired
> > programming, if that helps anybody. But maybe just asking, they I have x 
> > hours
> > this week to work on something, what can I do? Would be most welcome I 
> > think.
> > 
> Could we suggest Zuul to also do the review the OpenStack way?
>  https://docs.openstack.org/project-team-guide/review-the-openstack-way.html
> 
Could you be more specific here, which parts do we need to review the openstack
way.

> Best Regard,
> -Tristan
> 
> > > 
> > > > [1] https://governance.openstack.org/tc/reference/opens.html
> > > > [2] http://git.openstack.org/cgit/openstack-infra/reviewday
> > > > [3] http://paste.openstack.org/raw/725858/
> > > > [4] http://paste.openstack.org/raw/725860/
> > > > [5] https://git.zuul-ci.org/cgit
> > > > > _______________________________________________
> > > > Softwarefactory-dev mailing list
> > > > [email protected]
> > > > https://www.redhat.com/mailman/listinfo/softwarefactory-dev
> > > >
> > 
> > 
> > 


_______________________________________________
Softwarefactory-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/softwarefactory-dev

Re: [Softwarefactory-dev] Tech previews and upstream (zuul, nodepool)

Reply via email to