Re: Non-image blocker process change proposal

Dennis Gilmore Fri, 04 Dec 2015 10:24:15 -0800

On Wednesday, December 02, 2015 06:42:09 AM Kamil Paral wrote:
> > > Taking all of this into account, would this be a reasonable idea?
> > > 1. At Go/No-Go voting time, all updates which block F-N release but
> > > belong to F-M (M<N) release, must be already pushed stable. If this
> > > is not the case and it's the last blocking issue, selected tasks
> > > (like copying compose trees into appropriate places) can be
> > > performed, and Go/No-Go will be rescheduled to the day and time when
> > > it is expected that those updates will have been pushed.
> > 
> > I think thats not a great idea. It gets back to why we only slip in one
> > week increments. If say we have a go/no-go on a thursday and the only
> > thing blocking it is some update thats not pushed stable all the way
> > yet, we reschedule for friday and if it's not done then we schedule for
> > saturday? This means everyone has to work extra hours without even
> > being sure when the release will be.
> 
> If the update is pending stable and just not pushed, it might sense to move
> it one day, yes (most probably skipping weekends, though). If it needs more
> testing, we might decide to postpone it a several days. If it's not
> available at all yet, waiting an extra week might be the right choice. So
> it would depend on the situation and best guess of folks at Go/No-Go.


I am with Kevin here, we have things tightly coupled with mirrors and 
mirroring, making changes by a day or two throws timings way off.  purely 
because we have a built in sync buffer of the weekend. To slip the go/no go 
decision to Monday we would need to push out the ship date from Tuesday to 
Friday to give mirrors syncing time and that is making things somewhat tight. 
We really need to slip a week for any slip

> > Leaves less time to sync mirrors,
> > update common bugs, etc etc.
> 
> I would say the opposite - all of that can start happening right away, it's
> not blocked on waiting for the FN-x push. So in case the announcement gets
> out on Tuesday as usual, it's the same time, but if it gets pushed back to
> Wednesday or later day, it's more time for these tasks to happen. The only
> exception is that FN-x updates repo, which will get shorter sync time
> because we want to make sure people download the fixed packages, not old
> ones. Currently that behavior is undefined.

we can not put the bits onto the mirrors until we are sure they are the bits, 
otherwise we offer the mirrors lots of churn, wasted iops and bandwidth and we 
lose mirrors.

> > So, the alternative there would be to slip a week to get it pushed, but
> > some people may find that excessive.
> 
> That's why I wanted to propose something more flexible, but hey, it's just
> an idea.

In order to be more flexible here we would really need to change fundamentally 
how we push bits to the mirrors.  If we had a CDN of our own under our control 
we would have more options available, but the cost of that would be massive.
 
> > > 2. We will
> > > create a new mirrormanager script which will go through the specified
> > > metalink(s) and remove all metadata hashes which are older than
> > > provided timestamp/hash.
> > 
> > Something like that should be pretty easy to do I would think.
> > (Although I am not a mm developer)
> 
> Looking into existing MM scripts, I have the same opinion, but I can contact
> Adrian to confirm. If we want to make it even simpler, we can drop all
> alternative metadata and leave just the current hash (that script would be
> run once the push containing that critical update is performed).
I am okay with having a way to say ship only the latest metadata.

> > > 3. If there are such updates as mentioned in
> > > point 1., RelEng will use this script to remove old metadata
> > > alternatives from the metalink, which means only metadata from the
> > > day this update was pushed or newer will be kept. In order to not
> > > increase mirror strain too much, this doesn't need to be used
> > > immediately, but just shortly before the release announcement (so
> > > that mirrors have time to sync latest packages, and the user load is
> > > distributed among more mirrors including those with current-1 or
> > > current-2 trees as long as possible). 4. Once the script is run in
> > > point 3., we can post the release announcement in 6 hours.
> > > 
> > > I know there still one manual step involved (figuring out in which
> > > push the blocker update was included), but I don't know how to better
> > > solve it, especially if we don't want to wait for too long.
> > > 
> > > I would be interested in Infra/RelEng feedback for the technical part
> > > of this (CCing Kevin and Dennis). Do you think this is reasonable
> > > solution, or am I completely off the track here? Do you see any
> > > better options?
> > 
> > So, looking back, we had the case of that dnf-system-upgrade. Are there
> > any others in the past, or are we making a bigger than life deal out of
> > one case?
> 
> I don't want to exaggerate the topic, but I'd also like to find and describe
> a process how we can avoid it next time. It will be needed twice a year at
> maximum.
> 
> I believe there were a few similar issues in the past, but I can't really
> point to any other examples. In majority of cases, this is likely to be
> related to system upgrade (system-upgrade, dnf, plymouth, systemd, gpg
> keys).

there is the potential always of hitting issues. with upgrades. an older 
release gets a higher nvr and things get messy. It is not an issue just at 
release time.

> > Also, that case could have been solved by dropping the alternates in
> > metalink as you suggest above at 2 right?
> 
> Yes.
> 
> > One thing that perhaps we could improve is to somehow note these sorts
> > of things to releng. I just checked irc logs and I didn't see any
> > mention of that dnf-system-upgrade plugin update being important until
> > nov 3rd. Would a tracker ticket help this?
> 
> In the future, these issues should be tracked by blocker bugs app using
> bugzilla tracker and a specific keyword, so we should not lose track of
> this. But as mentioned, pushing to stable is not enough, we also need to
> make sure old content is not served to users. That's why the "dropping
> alternative metadata from metalink" idea. We can file a releng ticket for
> this, and either include a description of what needs to be done, or link to
> some wiki SOP. QA can take care of all of that. The only thing that we need
> to ensure is that it really is handled before the announcement goes live,
> so it needs to be listed somewhere in RelEng/Infra "new release" SOP. --

we have no way of ensuring always that people are getting the latest data, or 
that they have the latest bits installed. but people can always shoot 
themselves in the foot. people can and will do a distro update without 
updating the running os first. I would suggest not filing a rel-eng ticket and 
telling us what to do as that will not go over well. We should now sit down 
and work out a process. then likely a ticket needs to be filed asking that the 
process be followed. 

Dennis

signature.asc
Description: This is a digitally signed message part.

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
http://lists.fedoraproject.org/admin/lists/test@lists.fedoraproject.org

Re: Non-image blocker process change proposal

Reply via email to