Re: Controlled shutdown

Andrzej Kaczmarek Sun, 30 Sep 2018 11:28:12 -0700

Hi Chris,

The overall idea looks good to me. I just have some suggestions as below.


On Sat, Sep 29, 2018 at 1:37 AM Christopher Collins <ch...@runtime.io>
wrote:

> On Fri, Sep 28, 2018 at 04:18:08PM -0700, will sanfilippo wrote:
> > Some comments:
> >
> > 1) Are there are any other cases you can see for a controlled shutdown?
> I get the reset command. I am trying to think of others.
>
> I think the newtmgr reset command is the main use case (as well as the
> shell "reset" command).  It seems plausible that a device would want to
> reset itself for other reasons, but I can't think of an actual use case!
>

I guess device can be powered off simply as a power saving option or
shipping mode so perhaps 'reason' parameter could be added to shutdown
call. I do not know if it really matters to any package whether we shutdown
to reset device or just to power it off, but single extra parameter won't
hurt.


> > 2) I am curious: how do you know how many of these functions, or which
> ones, return in progress? Curious to know how you were going to implement
> that specifically.
>
> I was thinking the sysdown module would maintain a counter representing
> the number of in-progress shutdown subprocedures.  When a subprocedure
> completes, it calls `sysdown_complete()`, which decrements the counter.
> When the counter reaches 0, the system restarts.
>
> There is something else I should mention.  I now think it is a bad idea
> to use return codes to indicate whether the subprocedure is complete.
> Each subprocedure is called by generated code, and branching on return
> codes makes the generated code too complicated in my opinion.  I think
> it is better to add a new function, `sysdown_in_progress()`.  If a
> subprocedure needs to continue beyond the initial callback, it calls
> this new function before returning.
>

Return codes feel like a bit more intuitive way to do this. You don't
really need to use branching in shutdown code, I think of two other ways to
do this:
1. create array of pointers to functions and then main shutdown code can
call each function and examine return codes (or, a bit more complicated
way, create new section where these pointers are places and shutdown code
can also iterate them as array)
2. just return logical "or" of all return codes from all shutdown functions
- zero means shutdown is complete, non-zero means it's in progress (you can
make it a bitmask in case there's need to distinguish few status codes)


> Chris
>
> > Otherwise, sounds good and seems like a good addition to mynewt!
> >
> >
> > > On Sep 28, 2018, at 1:08 PM, Christopher Collins <ch...@runtime.io>
> wrote:
> > >
> > > Hello all,
> > >
> > > I have been looking into implementing a graceful shutdown for Mynewt.
> > > The system may want to perform a cleanup procedure immediately before
> it
> > > resets, and I wanted to allow this procedure to be configured.  I am
> > > calling this shutdown facility "sysdown", as a counterpart to
> "sysinit".
> > >
> > > ### BASIC IDEA:
> > >
> > > My idea is to allow each Mynewt package to specify a sequence of
> > > shutdown function calls, similar to a package's `pkg.init` function
> call
> > > list.  The newt tool would generate a C shutdown function called
> > > `sysdown()`.  This function would consist of calls to each package's
> > > shutdown functions.  When a controlled shutdown is initiated,
> > > `sysdown()` would be called prior to the final call to
> > > `hal_system_reset()`.
> > >
> > > To clarify, this procedure would only be performed for a controlled
> > > shutdown.  It would be executed when the system processes a newtmgr
> > > "reset" command, for example.  It would not be executed when the system
> > > crashes, browns out, or restarts due to the hardware watchdog.
> > >
> > > I think this scheme is pretty straightforward and I see no issues so
> far
> > > (but please pipe up if this doesn't seem right!).
> > >
> > > ### PROBLEM:
> > >
> > > Then I tried applying this to an actual use case, and of course I
> > > immediately encountered some problems :).
> > >
> > > My actual use case is this: when I reset the Mynewt device, I would
> like
> > > the nimble stack to gracefully terminate all open Bluetooth
> connections.
> > > This isn't strictly necessary; the connected peer will eventually
> > > realize that the connection has dropped some time after the reset.  The
> > > problem is that Android centrals take a really long time to realize
> that
> > > the connection has dropped, as described here:
> > >
> https://blog.classycode.com/a-short-story-about-android-ble-connection-timeouts-and-gatt-internal-errors-fa89e3f6a456
> .
> > > So, I wanted to explicitly terminate the connections to speed up the
> > > process.
> > >
> > > Ideally, I could configure the nimble host package with a shutdown
> > > callback that just performs a blocking terminate of each open
> > > connection in sequence.  Unfortunately, the nimble host is likely
> > > running in the same task as the one that initiated the shutdown, so it
> > > is not possible to perform a blocking operation.  Instead, the running
> > > task needs to terminate each connection asynchronously: enqueue a GAP
> > > terminate procedure, then return so that the task can process its event
> > > queue.  Eventually, the BLE terminate procedure will complete, and the
> > > result of the procedure will be indicated via an event on this event
> > > queue.  The sysdown mechanism I described earlier in this email doesn't
> > > allow for asynchronous procedures.  It reboots the system immediately
> > > after executing all the shutdown callbacks.
> > >
> > > I think this will be a common issue for other packages, so I am
> > > trying to come up with a general solution.
> > >
> > > ### SOLUTION:
> > >
> > > Each shutdown callback returns one of the following codes:
> > >    * SYSDOWN_COMPLETE
> > >    * SYSDOWN_IN_PROGRESS
> > >
> > > When a controlled reset is initiated, the shutdown facility executes
> > > every confgured callback.  If all callbacks return `SYSDOWN_COMPLETE`,
> > > then the procedure is done; the system completes the reset with a call
> > > to `hal_system_reset()`.
> > >
> > > If one or more callbacks returns `SYSDOWN_IN_PROGRESS`, then the
> > > shutdown facility needs to wait for those subprocedures to complete.
> > > Each in-progress shutdown subprocedure indicates completion
> > > asynchronously via a call to `sysdown_complete()`.  When the last
> > > remaining entry signifies completion, the shutdown facility finishes
> the
> > > shutdown procedure with a call to `hal_system_reset()`.  To defend
> > > against hanging subprocedures, the system can be configured to crash if
> > > the shutdown procedure takes too long.
> > >
> > > Does that sound reasonable?  All comments welcome.
> > >
> > > Thanks,
> > > Chris
> >
>

Best,
Andrzej

Re: Controlled shutdown

Reply via email to