Re: [Softwarefactory-dev] Artifact export for RPM factory in the Gate pipeline or the POST pipeline ?

Tristan Cacqueray Thu, 11 Feb 2016 16:06:58 -0800

Switching to softwarefactory-dev.

On 02/11/2016 09:45 PM, Fabien Boucher wrote:
> Le 11/02/2016 21:40, Tristan Cacqueray a écrit :
>> On 02/11/2016 04:51 PM, Fabien Boucher wrote:
>>> Here is my explanation :D
>>>
>>> This situation with the post pipeline to publish/build final artifact:
>>> ----------------------------------------------------------------------
>>>
>>> 1. A change on nova-dist has been approved, a job (eg. packstack is 
>>> running). The test succeed and the change
>>> "A" will be merged by zuul on rdo-liberty in the nova-distgit repo.
>>>
>>> Gate pipeline:
>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>   - packstack job
>>>
>>> Liberty repo:
>>> - nova_12.0.0.rpm
>>> - ceilometer_8.0.0.rpm
>>>
>>> 2. The change has been merged (in git), and a new change on 
>>> ceilometer-distgit has been approved (entering the pipeline).
>>> In the same time the post pipeline run "an artifact export job" for nova
>>>
>>> Gate pipeline:
>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>   - packstack job
>>>
>>> Post pipeline:
>>> - nova-distgit HEAD
>>>   - artifact export job (build non scratch on koji)
>>>
>>> Liberty repo:
>>> - nova_12.0.0.rpm
>>> - ceilometer_8.0.0.rpm
>>>
>>> -> The packstack job for ceilometer-distgit change is running. Packstack 
>>> installation
>>> is fetching our packages from the liberty repo, and fetch nova_12.0.0.rpm.
>>
>> It seems like the real issue is to fetch from upstream repo during test.
>> Why can't we request koji-build based on all project git master instead
>> of the final repository ? Didn't you mentioned a problem with Koji and
>> scratch build of merged commit ?
> 
> I don't think this is really realistic to rebuild the whole set of packages
> based on rdo-liberty branches for a test, it will take more the a while :)
> 
> Fetching from upstream repo (the koji liberty repo we gate) during a test is
> easy we just configure yum to target it. The real issue is to have it up to 
> date
> at the right moment. In fact the RPM need to be available in the repo when 
> just before
> the tested change goes out from the gate pipeline (shared queue) and just 
> before
> the change is merged in the Git repo/branch.
> 
> About the problem you mention, I don't think. I mentioned an issue but it
> was related to an unexpected close of the communication channel by jenkins
> when the pkg-export job start a "non scratch" build against koji resulting
> of pkg publish and the related git change not merged on the repo :/ But it's 
> not
> suppose to happen often :) I hope. 
> 
>>
>>
>>> Packstack has fetched nova 12.0.0 and is validating ceilometer along with 
>>> it.
>>>
>>> 3. The post job "artifact export" succeed to build the artifact
>>> "build against koji" and nova_12.0.1.rpm lands in the repository. Nice.
>>>
>>> 4. The packstack job of ceilometer succeed. Nice, the post job for the 
>>> ceilometer change
>>> starts and succeed then we have:
>>>
>>> Liberty repo:
>>> - nova_12.0.1.rpm
>>> - ceilometer_8.0.1.rpm
>>>
>>> /!\ What if the nova 12.0.1 introduced a change in the packaging, a new 
>>> file, a patch, ...
>>> that prevent ceilometer package to be installed, or prevent ceilometer to 
>>> work well ...
>>> Then this result in a broken RPM repository /!\ 
>>> -> And why did this arrive ? because in the meantime the post job for 
>>> nova-distgit was run
>>> we have validated a change on ceilometer testing it with the previous 
>>> version of nova !
>>>
>>>
>>> Furthermore what if the nova-distgit "artifact export job" failed to build
>>> nova 12.0.1 and we didn't noticed that. Other changes enters in the gate 
>>> pipeline
>>> are validated with packstack (+ nova 12.0.0) and build to the final repo 
>>> via the post job.
>>> Then we will, later, when we discover the inconsistency force the post 
>>> pipeline to
>>> run for nova 12.0.1 without any tests. 12.0.1 will lands in liberty repo. 
>>> Nice !
>>> -> But are we sure 12.0.1 will work with other changes (new rpm that have 
>>> landed)
>>> during the time of the inconsistency ? Maybe ... Maybe not :D
>>> In that case we can instead of running again the post job. We can then bump 
>>> the
>>> nova to 12.0.1-1 ..., it will force tests to re-run with the last version 
>>> of the
>>> liberty repo. Well but what to put in the changelog of 12.0.1-1 "force a 
>>> rebuild" ?
>>>
>>>
>>> Another case:
>>> Gate pipeline:
>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>   - packstack job
>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>   - packstack job
>>>
>>> Liberty repo:
>>> - nova_12.0.0.rpm
>>> - ceilometer_8.0.0.rpm
>>>
>>> The packstack job for ceilometer-distgit will request a build (scratch) 
>>> against koji
>>> of ceilometer but also thanks to Zuul it knows that a nova-distgit change 
>>> is currently
>>> tested and may lands in the liberty repo so it will also request a build of
>>> nova 12.0.1 against koji. So locally it can build a repo containing 
>>> nova_12.0.1.rpm
>>> and ceilometer_8.0.1.rpm. ceilometer change is tested with packstack with a 
>>> good
>>> test environment.
>>> Then nova-distgit change succeed, the post job start and fail to build the 
>>> final artifact
>>> "koji build". The change currently on top of the pipeline didn't noticed 
>>> that ... (the fail occurred in the post pipeline)
>>> and succeed to validate ceilometer (along with nova 12.0.1), the post job 
>>> starts and ceilometer 8.0.1 lands in
>>> the liberty repo. Nice ! what if the change to bump ceilometer to 8.0.1 was 
>>> in fact unable to work with
>>> nova 12.0.0 ... then a broken liberty repo !
>>>
>>> Furthermore if we want a post job (for publishing) then we need an 
>>> additional node, static and
>>> usually with 1 jenkins worker. Indeed if we use nodepool to spawn node for 
>>> the post pipeline
>>> or we use more that 1 executor then we cannot be sure RPM lands in the 
>>> RPM repo serially ... and do we really want that someone checkout the 
>>> liberty-repo
>>> with a RPM supposed to land before another, but this another has not landed 
>>> yet ...
>>>
>>> ----  
>>>
>>> So if you reached this point cool \o/. Now if we take those previous 
>>> examples and we think about
>>> start the final build on koji (non scratch build) inside a gate job.
>>>
>>> Gate pipeline:
>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>   - packstack job (SUCCEED)
>>>   - artifact export job (build non scratch on koji) (RUNNING)
>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>  - packstack job (RUNNING)
>>>  - artifact export job (build non scratch on koji) (RUNNING)
>>>
>>> * The ceilometer-distgit final RPM cannot land yet in the final liberty RPM 
>>> repo
>>>   because the nova-distgit has not landed yet in the liberty RPM repo.
>>>
>>> * if "artifact export job" for nova-distgit fails:
>>>   - change is not merged
>>>   - RPM won't land inside the RPM
>>> -> ceilometer-distgit jobs will restart (as a dependent change failed (all 
>>> distgit projects share the same job))
>>>   and the packstack job for ceilometer will be tested with nova 12.0.0 
>>> (that is in the liberty RPM repo)
>>>   So it will land in the RPM repo but it has been tested with the right 
>>> version of nova RPM.
>>>
>>> ----> Remember with the post job to do the final build, at this time, we 
>>> didn't know the status of the post job (fails)
>>>   so the workflow has not figured out the export failed so ceilometer has 
>>> been tested with nova 12.0.1 but nova 12.0.1 has not
>>> landed ... but ceilometer landed ... :/
>>>
>>>
>>> "artifact export job" uses :
>>> https://github.com/redhat-cip/software-factory/blob/master/tools/slaves/wait_for_other_jobs.py
>>> and an example of it (for rpmfactory):
>>> https://github.com/redhat-cip/rpmfactory/blob/master/gating/pkg-export.sh
>>>
>>>
>>> -----
>>>
>>> So let me know if building artifacts in the gate pipeline is relevant for 
>>> you (at least for RPM Factory) ?
>>>
>> Well definitely, most RDO packages are tightly integrated and there is a
>> non-negotiable risk of breakage.
>>
>>
>>> Note in the RPM factory context we gate "a repository" ... so we use that 
>>> repository to test other changes ...
>>>
>>> Now imagine if in zuul we configured the gating to not submit git changes 
>>> ... and do it in the post pipeline ...
>>> do you think then Zuul is still valuable ? I'm not so sure.
>>>
>>
>>
>> Alright thank you Fabien for starting this discussion.
>> If I understand correctly the problem is really the window between a
>> gate success and a post-job success. Any changes that enter the gate
>> during that window won't be tested against changes that are still being
>> post-processed... Other wise both approach seems to be identical.
> 
> Yes this is related to delay between the change goes out from the gate 
> pipeline
> and the moment the post pipeline finish to exec the job. But not only
> because when the export job is executed in the gate pipeline, the gate
> pipeline knows the job status and then she is able to react to this failure:
> - Git change is not merge in the git repo
> - Following changes in the gate shared queue will stop their jobs
>   and restart by skipping this broken change.
> If the export job is executed in the post pipeline none of the two facts
> above will happen ...  
> 
>>
>> Your initial proposition works for sure, but I question the need to have
>> such job that will wait to be at the tip of the gate to actually
>> SUCCESS. This seems to severely limit the capability of zuul and it's
>> speculatively merge based design...
> 
> I don't get that, why will it limit this capability ?
> For me nothing change in this area but I can miss something, please clarify.
>


I believe it limits zuul capability since the pkg_exports is serial for
the gate. If that pkg_exports task takes 1 hour, then we can't merge
more than 24 change per day.

Here are the pros and cons I understood so far, please correct me if I'm
wrong:

Wait_for_other_jobs Pros:
* If publish fail, then the change isn't merge.
* Gate tests are fast because they use upstream packages

Cons:
* We lose the ability to merge change in parallel
* This might lock the gate if all the resources are somehow allocated
  to pkg_export tasks. (Since the job will wait for job stuck in queued)
* It Can be a source of random failure that will affect the gate
  (e.g. when rdo mirror timeout)

"Publish in post" Pros:
* Similar workflow of openstack-infra for publish-to-pypi job (which is
  a big pro imo since it works)

Cons:
* Post-job needs to be monitored when it fails.
* Test time may be longer.


TL;DR; As already mentioned during the last sprint review, the
wait_for_other_jobs seems like the most trivial and effective way to
ensure RDO repository are fully validated. However since it's a risky
decision, I'd like we really consider other solution, at least to
demonstrate this is the superior approach.

Perhaps we should discuss these solutions with the RDO folks too.
Basically wait_for_other_job will guarantee that the repos are stable,
with the cost of a special zuul gate job that may slow down development.
Otherwise we can publish in a post job, with the risk of having desync
between git and rpm repo.



Back to the original issue of not being able to build all RDO packages
for each change, how about instead of rebuilding only what is currently
in the gate, can't we rebuild what is also in the post pipeline ? Like
that we don't rebuild everything each time and we still have something
identical to what will be upstream. That may be another approach...
(Assuming a failing post job also stays in the post pipeline until it
succeed).

Regards,
-Tristan

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Softwarefactory-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/softwarefactory-dev

Re: [Softwarefactory-dev] Artifact export for RPM factory in the Gate pipeline or the POST pipeline ?

Reply via email to