Re: Current handling of failed upgrades is screwy

2014-07-16 Thread roger peppe
On 16 July 2014 13:12, David Cheney wrote: > +1<<32 ... might be +0 on a 32 bit architecture :-) -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread David Cheney
+1<<32 On Wed, Jul 16, 2014 at 9:49 PM, Menno Smits wrote: > > On 16 July 2014 22:36, David Cheney wrote: >> >> If that is the blocker. Can we introduce a major version which does >> not change the schema at all. Then we know that everyone running Juju >> has a functional backup system. As I und

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread Menno Smits
On 16 July 2014 22:36, David Cheney wrote: > If that is the blocker. Can we introduce a major version which does > not change the schema at all. Then we know that everyone running Juju > has a functional backup system. As I understand it, we require people > to upgrade in order, without skipping

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread David Cheney
If that is the blocker. Can we introduce a major version which does not change the schema at all. Then we know that everyone running Juju has a functional backup system. As I understand it, we require people to upgrade in order, without skipping versions. On Wed, Jul 16, 2014 at 8:32 PM, Menno Smi

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread Menno Smits
On 16 July 2014 18:55, William Reade wrote: > On Wed, Jul 16, 2014 at 3:46 AM, Menno Smits > wrote: > >> OK - points taken. >> >> So taking your ideas and extending them a little, I'm thinking: >> >>- retry upgrade steps on failure (with inter-attempt delay) >>- indicate when there's upg

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread William Reade
On Wed, Jul 16, 2014 at 3:46 AM, Menno Smits wrote: > OK - points taken. > > So taking your ideas and extending them a little, I'm thinking: > >- retry upgrade steps on failure (with inter-attempt delay) >- indicate when there's upgrade problems by setting the machine agent >status >

Re: Current handling of failed upgrades is screwy

2014-07-15 Thread Menno Smits
OK - points taken. So taking your ideas and extending them a little, I'm thinking: - retry upgrade steps on failure (with inter-attempt delay) - indicate when there's upgrade problems by setting the machine agent status - if despite the retries the upgrade won't complete, report this

Re: Current handling of failed upgrades is screwy

2014-07-15 Thread William Reade
FWIW, we could set some error status on the affected agent (so users can see there's a problem) and make it return 0 (so that upstart doesn't keep hammering it); but as jam points out that's not helpful when it's a transient error. I'd suggest retrying a few times, with some delay between attempts,

Re: Current handling of failed upgrades is screwy

2014-07-13 Thread Menno Smits
On 10 July 2014 20:57, John Meinel wrote: > I think it fundamentally comes down to "is the reason upgrade failed > transient or permanent", if we can try again later, do so, else log at > Error level, and keep on with your life, because that is the only chance of > recovery (from what you've said

Re: Current handling of failed upgrades is screwy

2014-07-10 Thread John Meinel
I think it fundamentally comes down to "is the reason upgrade failed transient or permanent", if we can try again later, do so, else log at Error level, and keep on with your life, because that is the only chance of recovery (from what you've said, at least). John =:-> On Thu, Jul 10, 2014 at 11

Current handling of failed upgrades is screwy

2014-07-10 Thread Menno Smits
So I've noticed that the way we currently handle failed upgrades in the machine agent doesn't make a lot of sense. Looking at cmd/jujud/machine.go:821, an error is created if PerformUpgrade() fails but nothing is ever done with it. It's not returned and it's not logged. This means that if upgrade