Re: Controlled shutdown

2018-10-09 Thread Christopher Collins
Hi Vipul,

On Tue, Oct 09, 2018 at 12:01:46PM -0700, Vipul Rahane wrote:
> Sorry for the late reply. 

No problem!

> I really like the idea. Thank you for doing this Chris. A much needed
> feature. A possible use case just came to my mind.
> 
> One module might have to be shutdown before shutting down others for
> example: Sensors using I2C/SPI would have to be shut down before
> shutting down the underlying interfaces.
> 
> This is kind of similar to pkg.init levels. I wanted to understand if
> you had any kind of priority in mind.

That is exactly what I had in mind.  I have submitted the relevant PRs
for this feature here:

Newt: https://github.com/apache/mynewt-newt/pull/218
Core: https://github.com/apache/mynewt-core/pull/1447
Nimble: https://github.com/apache/mynewt-nimble/pull/216

The newt PR describes the syntax for configuring a package's sysdown
functions:

pkg.down:
: 
e.g.,

pkg.down:
ble_hs_shutdown: 200

As with sysinit, sysdown functions are excuted in ascending order of
stage number.  When there are two or more identical stage numbers, the
functions are executed in lexicographic order according to their C
function name.

Thanks,
Chris


Re: Controlled shutdown

2018-10-09 Thread Vipul Rahane
Hey,

Sorry for the late reply. I really like the idea. Thank you for doing this 
Chris. A much needed feature. A possible use case just came to my mind.

One module might have to be shutdown before shutting down others for example: 
Sensors using I2C/SPI would have to be shut down before shutting down the 
underlying interfaces.

This is kind of similar to pkg.init levels. I wanted to understand if you had 
any kind of priority in mind.

Regards,
Vipul Rahane

> On Oct 1, 2018, at 10:44 AM, Andrzej Kaczmarek 
>  wrote:
> 
> Hi Chris,
> 
> On Mon, Oct 1, 2018 at 7:39 PM Christopher Collins  wrote:
> 
>> Hi Andrzej,
>> 
>> On Sun, Sep 30, 2018 at 08:27:19PM +0200, Andrzej Kaczmarek wrote:
>>> I guess device can be powered off simply as a power saving option or
>>> shipping mode so perhaps 'reason' parameter could be added to shutdown
>>> call. I do not know if it really matters to any package whether we
>> shutdown
>>> to reset device or just to power it off, but single extra parameter won't
>>> hurt.
>> 
>> Good idea.
>> 
>>> Return codes feel like a bit more intuitive way to do this. You don't
>>> really need to use branching in shutdown code, I think of two other ways
>> to
>>> do this:
>>> 1. create array of pointers to functions and then main shutdown code can
>>> call each function and examine return codes (or, a bit more complicated
>>> way, create new section where these pointers are places and shutdown code
>>> can also iterate them as array)
>> 
>> Another good idea.  I will try it out and report back!
>> 
>>> 2. just return logical "or" of all return codes from all shutdown
>> functions
>>> - zero means shutdown is complete, non-zero means it's in progress (you
>> can
>>> make it a bitmask in case there's need to distinguish few status codes)
>> 
>> I am having a hard time seeing how this would work.  A logical-or of the
>> return codes would only tell us if at least one subprocedure is still in
>> progress.  This is not enough information; we need to know exactly how
>> many are still in progress so that we can determine when they have all
>> completed (i.e., when the counter reaches 0).  Are you envisioning a
>> solution that does not require a counter?
>> 
> 
> Ah, you're absolutely right. I forgot that we need to know how many
> subprocedures are pending. So this option is not a solution here.
> 
> Chris
>> 
> 
> Best,
> Andrzej



Re: Controlled shutdown

2018-10-01 Thread Christopher Collins
Hi Martin,

On Fri, Sep 28, 2018 at 04:30:17PM -0700, Martin Turon wrote:
> +1 for graceful shutdown.
> 
> It could also be useful for coarse, multi-protocol use cases such as using
> BLE for commissioning, shutting down that stack, and then starting a 15.4
> stack such as Thread.
> 
> In general, it would be great if nimble could support high level stack
> start / stop commands such as the following shell API imply:
> 
> ble start# start nimble stack
> ble scan start   # start BLE scan ...
> ble scan stop# stop BLE scan
> *ble stop *# NEW: gracefully shutdown nimble
> 
> 
> All of the above can be supported right now, except "ble stop", so your
> proposal is welcome.
> 
> I would also suggest keeping the concepts of graceful shutdown and reset
> separable.

I agree with all of the above.  Unfortunately, the ability to stop the
stack absent a reset, which implies the ability to reenable it, adds a
bit of complexity to this feature.  I think this functionality is
important enough to justify the added complexity.

Chris


Re: Controlled shutdown

2018-10-01 Thread Kevin Townsend
+1 to the idea of being able to handle this at the module/package level to
subscribe to 'shutdown' or 'sleep' events. The package level is the right
place to handle this since the package itself is aware of the transport
being used for drivers (I2C/SPI/etc.), and the pins that can be switched to
an appropriate power-saving mode (disable internal pullups, etc.) and
properly restored on wakeup. It does bring up the question of SRAM
retention at the HAL level, though, and sharing access to whatever block of
SRAM is being maintained in lower powder mode(s) if the scope here is
larger than just 'shutting down' and 'starting up'?


> I guess device can be powered off simply as a power saving option or
> shipping mode so perhaps 'reason' parameter could be added to shutdown
> call. I do not know if it really matters to any package whether we shutdown
> to reset device or just to power it off, but single extra parameter won't
> hurt.
>

I think this is a good idea in situations where you might want to log the
shutdown reason and perhaps check it on the next startup. For example, if
we shutdown because the battery is critically low, and this is detected at
the next startup, you can run through a config memory verification process,
etc.

Kevin


Re: Controlled shutdown

2018-09-30 Thread Andrzej Kaczmarek
Hi Chris,

The overall idea looks good to me. I just have some suggestions as below.

On Sat, Sep 29, 2018 at 1:37 AM Christopher Collins 
wrote:

> On Fri, Sep 28, 2018 at 04:18:08PM -0700, will sanfilippo wrote:
> > Some comments:
> >
> > 1) Are there are any other cases you can see for a controlled shutdown?
> I get the reset command. I am trying to think of others.
>
> I think the newtmgr reset command is the main use case (as well as the
> shell "reset" command).  It seems plausible that a device would want to
> reset itself for other reasons, but I can't think of an actual use case!
>

I guess device can be powered off simply as a power saving option or
shipping mode so perhaps 'reason' parameter could be added to shutdown
call. I do not know if it really matters to any package whether we shutdown
to reset device or just to power it off, but single extra parameter won't
hurt.


> > 2) I am curious: how do you know how many of these functions, or which
> ones, return in progress? Curious to know how you were going to implement
> that specifically.
>
> I was thinking the sysdown module would maintain a counter representing
> the number of in-progress shutdown subprocedures.  When a subprocedure
> completes, it calls `sysdown_complete()`, which decrements the counter.
> When the counter reaches 0, the system restarts.
>
> There is something else I should mention.  I now think it is a bad idea
> to use return codes to indicate whether the subprocedure is complete.
> Each subprocedure is called by generated code, and branching on return
> codes makes the generated code too complicated in my opinion.  I think
> it is better to add a new function, `sysdown_in_progress()`.  If a
> subprocedure needs to continue beyond the initial callback, it calls
> this new function before returning.
>

Return codes feel like a bit more intuitive way to do this. You don't
really need to use branching in shutdown code, I think of two other ways to
do this:
1. create array of pointers to functions and then main shutdown code can
call each function and examine return codes (or, a bit more complicated
way, create new section where these pointers are places and shutdown code
can also iterate them as array)
2. just return logical "or" of all return codes from all shutdown functions
- zero means shutdown is complete, non-zero means it's in progress (you can
make it a bitmask in case there's need to distinguish few status codes)


> Chris
>
> > Otherwise, sounds good and seems like a good addition to mynewt!
> >
> >
> > > On Sep 28, 2018, at 1:08 PM, Christopher Collins 
> wrote:
> > >
> > > Hello all,
> > >
> > > I have been looking into implementing a graceful shutdown for Mynewt.
> > > The system may want to perform a cleanup procedure immediately before
> it
> > > resets, and I wanted to allow this procedure to be configured.  I am
> > > calling this shutdown facility "sysdown", as a counterpart to
> "sysinit".
> > >
> > > ### BASIC IDEA:
> > >
> > > My idea is to allow each Mynewt package to specify a sequence of
> > > shutdown function calls, similar to a package's `pkg.init` function
> call
> > > list.  The newt tool would generate a C shutdown function called
> > > `sysdown()`.  This function would consist of calls to each package's
> > > shutdown functions.  When a controlled shutdown is initiated,
> > > `sysdown()` would be called prior to the final call to
> > > `hal_system_reset()`.
> > >
> > > To clarify, this procedure would only be performed for a controlled
> > > shutdown.  It would be executed when the system processes a newtmgr
> > > "reset" command, for example.  It would not be executed when the system
> > > crashes, browns out, or restarts due to the hardware watchdog.
> > >
> > > I think this scheme is pretty straightforward and I see no issues so
> far
> > > (but please pipe up if this doesn't seem right!).
> > >
> > > ### PROBLEM:
> > >
> > > Then I tried applying this to an actual use case, and of course I
> > > immediately encountered some problems :).
> > >
> > > My actual use case is this: when I reset the Mynewt device, I would
> like
> > > the nimble stack to gracefully terminate all open Bluetooth
> connections.
> > > This isn't strictly necessary; the connected peer will eventually
> > > realize that the connection has dropped some time after the reset.  The
> > > problem is that Android centrals take a really long time to realize
> that
> > > the connection has dropped

Re: Controlled shutdown

2018-09-28 Thread Christopher Collins
On Fri, Sep 28, 2018 at 04:18:08PM -0700, will sanfilippo wrote:
> Some comments:
> 
> 1) Are there are any other cases you can see for a controlled shutdown? I get 
> the reset command. I am trying to think of others.

I think the newtmgr reset command is the main use case (as well as the
shell "reset" command).  It seems plausible that a device would want to
reset itself for other reasons, but I can't think of an actual use case!

> 2) I am curious: how do you know how many of these functions, or which ones, 
> return in progress? Curious to know how you were going to implement that 
> specifically.

I was thinking the sysdown module would maintain a counter representing
the number of in-progress shutdown subprocedures.  When a subprocedure
completes, it calls `sysdown_complete()`, which decrements the counter.
When the counter reaches 0, the system restarts.

There is something else I should mention.  I now think it is a bad idea
to use return codes to indicate whether the subprocedure is complete.
Each subprocedure is called by generated code, and branching on return
codes makes the generated code too complicated in my opinion.  I think
it is better to add a new function, `sysdown_in_progress()`.  If a
subprocedure needs to continue beyond the initial callback, it calls
this new function before returning.

Chris

> Otherwise, sounds good and seems like a good addition to mynewt!
> 
> 
> > On Sep 28, 2018, at 1:08 PM, Christopher Collins  wrote:
> > 
> > Hello all,
> > 
> > I have been looking into implementing a graceful shutdown for Mynewt.
> > The system may want to perform a cleanup procedure immediately before it
> > resets, and I wanted to allow this procedure to be configured.  I am
> > calling this shutdown facility "sysdown", as a counterpart to "sysinit".
> > 
> > ### BASIC IDEA:
> > 
> > My idea is to allow each Mynewt package to specify a sequence of
> > shutdown function calls, similar to a package's `pkg.init` function call
> > list.  The newt tool would generate a C shutdown function called
> > `sysdown()`.  This function would consist of calls to each package's
> > shutdown functions.  When a controlled shutdown is initiated,
> > `sysdown()` would be called prior to the final call to
> > `hal_system_reset()`.
> > 
> > To clarify, this procedure would only be performed for a controlled
> > shutdown.  It would be executed when the system processes a newtmgr
> > "reset" command, for example.  It would not be executed when the system
> > crashes, browns out, or restarts due to the hardware watchdog.
> > 
> > I think this scheme is pretty straightforward and I see no issues so far
> > (but please pipe up if this doesn't seem right!).
> > 
> > ### PROBLEM:
> > 
> > Then I tried applying this to an actual use case, and of course I
> > immediately encountered some problems :).
> > 
> > My actual use case is this: when I reset the Mynewt device, I would like
> > the nimble stack to gracefully terminate all open Bluetooth connections.
> > This isn't strictly necessary; the connected peer will eventually
> > realize that the connection has dropped some time after the reset.  The
> > problem is that Android centrals take a really long time to realize that
> > the connection has dropped, as described here:
> > https://blog.classycode.com/a-short-story-about-android-ble-connection-timeouts-and-gatt-internal-errors-fa89e3f6a456.
> > So, I wanted to explicitly terminate the connections to speed up the
> > process.
> > 
> > Ideally, I could configure the nimble host package with a shutdown
> > callback that just performs a blocking terminate of each open
> > connection in sequence.  Unfortunately, the nimble host is likely
> > running in the same task as the one that initiated the shutdown, so it
> > is not possible to perform a blocking operation.  Instead, the running
> > task needs to terminate each connection asynchronously: enqueue a GAP
> > terminate procedure, then return so that the task can process its event
> > queue.  Eventually, the BLE terminate procedure will complete, and the
> > result of the procedure will be indicated via an event on this event
> > queue.  The sysdown mechanism I described earlier in this email doesn't
> > allow for asynchronous procedures.  It reboots the system immediately
> > after executing all the shutdown callbacks.
> > 
> > I think this will be a common issue for other packages, so I am
> > trying to come up with a general solution.
> > 
> > ### SOLUTION:
> > 
> > Each shutdown callback returns one of the following codes:
> >

Re: Controlled shutdown

2018-09-28 Thread Martin Turon
+1 for graceful shutdown.

It could also be useful for coarse, multi-protocol use cases such as using
BLE for commissioning, shutting down that stack, and then starting a 15.4
stack such as Thread.

In general, it would be great if nimble could support high level stack
start / stop commands such as the following shell API imply:

ble start# start nimble stack
ble scan start   # start BLE scan ...
ble scan stop# stop BLE scan
*ble stop *# NEW: gracefully shutdown nimble


All of the above can be supported right now, except "ble stop", so your
proposal is welcome.

I would also suggest keeping the concepts of graceful shutdown and reset
separable.

Martin

_
Martin Turon  |  Google


On Fri, Sep 28, 2018 at 4:18 PM, will sanfilippo  wrote:

> Some comments:
>
> 1) Are there are any other cases you can see for a controlled shutdown? I
> get the reset command. I am trying to think of others.
> 2) I am curious: how do you know how many of these functions, or which
> ones, return in progress? Curious to know how you were going to implement
> that specifically.
>
> Otherwise, sounds good and seems like a good addition to mynewt!
>
>
> > On Sep 28, 2018, at 1:08 PM, Christopher Collins 
> wrote:
> >
> > Hello all,
> >
> > I have been looking into implementing a graceful shutdown for Mynewt.
> > The system may want to perform a cleanup procedure immediately before it
> > resets, and I wanted to allow this procedure to be configured.  I am
> > calling this shutdown facility "sysdown", as a counterpart to "sysinit".
> >
> > ### BASIC IDEA:
> >
> > My idea is to allow each Mynewt package to specify a sequence of
> > shutdown function calls, similar to a package's `pkg.init` function call
> > list.  The newt tool would generate a C shutdown function called
> > `sysdown()`.  This function would consist of calls to each package's
> > shutdown functions.  When a controlled shutdown is initiated,
> > `sysdown()` would be called prior to the final call to
> > `hal_system_reset()`.
> >
> > To clarify, this procedure would only be performed for a controlled
> > shutdown.  It would be executed when the system processes a newtmgr
> > "reset" command, for example.  It would not be executed when the system
> > crashes, browns out, or restarts due to the hardware watchdog.
> >
> > I think this scheme is pretty straightforward and I see no issues so far
> > (but please pipe up if this doesn't seem right!).
> >
> > ### PROBLEM:
> >
> > Then I tried applying this to an actual use case, and of course I
> > immediately encountered some problems :).
> >
> > My actual use case is this: when I reset the Mynewt device, I would like
> > the nimble stack to gracefully terminate all open Bluetooth connections.
> > This isn't strictly necessary; the connected peer will eventually
> > realize that the connection has dropped some time after the reset.  The
> > problem is that Android centrals take a really long time to realize that
> > the connection has dropped, as described here:
> > https://blog.classycode.com/a-short-story-about-android-ble-
> connection-timeouts-and-gatt-internal-errors-fa89e3f6a456.
> > So, I wanted to explicitly terminate the connections to speed up the
> > process.
> >
> > Ideally, I could configure the nimble host package with a shutdown
> > callback that just performs a blocking terminate of each open
> > connection in sequence.  Unfortunately, the nimble host is likely
> > running in the same task as the one that initiated the shutdown, so it
> > is not possible to perform a blocking operation.  Instead, the running
> > task needs to terminate each connection asynchronously: enqueue a GAP
> > terminate procedure, then return so that the task can process its event
> > queue.  Eventually, the BLE terminate procedure will complete, and the
> > result of the procedure will be indicated via an event on this event
> > queue.  The sysdown mechanism I described earlier in this email doesn't
> > allow for asynchronous procedures.  It reboots the system immediately
> > after executing all the shutdown callbacks.
> >
> > I think this will be a common issue for other packages, so I am
> > trying to come up with a general solution.
> >
> > ### SOLUTION:
> >
> > Each shutdown callback returns one of the following codes:
> >* SYSDOWN_COMPLETE
> >* SYSDOWN_IN_PROGRESS
> >
> > When a controlled reset is initiated, the shutdown facility executes
> > every confgured call

Re: Controlled shutdown

2018-09-28 Thread will sanfilippo
Some comments:

1) Are there are any other cases you can see for a controlled shutdown? I get 
the reset command. I am trying to think of others.
2) I am curious: how do you know how many of these functions, or which ones, 
return in progress? Curious to know how you were going to implement that 
specifically.

Otherwise, sounds good and seems like a good addition to mynewt!


> On Sep 28, 2018, at 1:08 PM, Christopher Collins  wrote:
> 
> Hello all,
> 
> I have been looking into implementing a graceful shutdown for Mynewt.
> The system may want to perform a cleanup procedure immediately before it
> resets, and I wanted to allow this procedure to be configured.  I am
> calling this shutdown facility "sysdown", as a counterpart to "sysinit".
> 
> ### BASIC IDEA:
> 
> My idea is to allow each Mynewt package to specify a sequence of
> shutdown function calls, similar to a package's `pkg.init` function call
> list.  The newt tool would generate a C shutdown function called
> `sysdown()`.  This function would consist of calls to each package's
> shutdown functions.  When a controlled shutdown is initiated,
> `sysdown()` would be called prior to the final call to
> `hal_system_reset()`.
> 
> To clarify, this procedure would only be performed for a controlled
> shutdown.  It would be executed when the system processes a newtmgr
> "reset" command, for example.  It would not be executed when the system
> crashes, browns out, or restarts due to the hardware watchdog.
> 
> I think this scheme is pretty straightforward and I see no issues so far
> (but please pipe up if this doesn't seem right!).
> 
> ### PROBLEM:
> 
> Then I tried applying this to an actual use case, and of course I
> immediately encountered some problems :).
> 
> My actual use case is this: when I reset the Mynewt device, I would like
> the nimble stack to gracefully terminate all open Bluetooth connections.
> This isn't strictly necessary; the connected peer will eventually
> realize that the connection has dropped some time after the reset.  The
> problem is that Android centrals take a really long time to realize that
> the connection has dropped, as described here:
> https://blog.classycode.com/a-short-story-about-android-ble-connection-timeouts-and-gatt-internal-errors-fa89e3f6a456.
> So, I wanted to explicitly terminate the connections to speed up the
> process.
> 
> Ideally, I could configure the nimble host package with a shutdown
> callback that just performs a blocking terminate of each open
> connection in sequence.  Unfortunately, the nimble host is likely
> running in the same task as the one that initiated the shutdown, so it
> is not possible to perform a blocking operation.  Instead, the running
> task needs to terminate each connection asynchronously: enqueue a GAP
> terminate procedure, then return so that the task can process its event
> queue.  Eventually, the BLE terminate procedure will complete, and the
> result of the procedure will be indicated via an event on this event
> queue.  The sysdown mechanism I described earlier in this email doesn't
> allow for asynchronous procedures.  It reboots the system immediately
> after executing all the shutdown callbacks.
> 
> I think this will be a common issue for other packages, so I am
> trying to come up with a general solution.
> 
> ### SOLUTION:
> 
> Each shutdown callback returns one of the following codes:
>* SYSDOWN_COMPLETE
>* SYSDOWN_IN_PROGRESS
> 
> When a controlled reset is initiated, the shutdown facility executes
> every confgured callback.  If all callbacks return `SYSDOWN_COMPLETE`,
> then the procedure is done; the system completes the reset with a call
> to `hal_system_reset()`.
> 
> If one or more callbacks returns `SYSDOWN_IN_PROGRESS`, then the
> shutdown facility needs to wait for those subprocedures to complete.
> Each in-progress shutdown subprocedure indicates completion
> asynchronously via a call to `sysdown_complete()`.  When the last
> remaining entry signifies completion, the shutdown facility finishes the
> shutdown procedure with a call to `hal_system_reset()`.  To defend
> against hanging subprocedures, the system can be configured to crash if
> the shutdown procedure takes too long.
> 
> Does that sound reasonable?  All comments welcome.
> 
> Thanks,
> Chris