Hello all,

The Mynewt BLE stack is called Nimble.  Nimble consists of two packages:
    * Controller (link-layer) [net/nimble/controller]
    * Host (upper layers)     [net/nimble/host]

This email concerns the Nimble host.  

As I indicated in an email a few weeks ago, the code size of the Nimble
host had increased beyond what I considered a reasonable level.  When
built for the ARM cortex-M4, with security enabled and the log level set
to INFO, the host code size was about 48 kB.  In recent days, I came up
with a few ideas for reducing the host code size.  As I explored these
ideas, I realized that they open the door for some major improvements in
the fundamental design of the host.  Making these changes would
introduce some backwards-compatibility issues, but I believe it is
absolutely the right thing to do.  If we do this, it needs to be done
now while Mynewt is still in its beta phase.  I have convinced myself
that this is the right way forward; now I would like to see what the
community thinks.  As always, all feedback is greatly appreciated.

There are two major changes that I am proposing:

1. All HCI command/acknowledgement exchanges are blocking.

Background: The host and controller communicate with one another via the
host-controller-interface (HCI) protocol.  The host sends _commands_ to
the controller; the controller sends _events_ to the host.  Whenever the
controller receives a command from the host, it immediately responds
with an acknowledgement event.  In addition, the controller also sends
unsolicited events to the host to indicate state changes or to request
information in a subsequent command.

In the current host, all HCI commands are sent asynchronously
(non-blocking).  When the host wants to send an HCI command, it
schedules a transmit operation by putting an OS event on its own event
queue.  The event points to a callback which does the actual HCI
transmission.  The callback also configures a second callback to be
executed when the expected acknowledgement is received from the
controller.  Each time the host receives an HCI event from the
controller, an OS event is put on the host's event queue.  Processing of
this OS event ultimately calls the configured callback (if it is an
acknowledgement), or a hardcoded callback (if it is an unsolicited HCI
event).

This design works, but it introduces a number of problems.  First, it
requires the host code to maintain some quite complex state machines for
what seem like simple HCI exchanges.  This FSM machinery translates into
a lot of extra code.  There is also a lot of ugliness involved in
canceling scheduled HCI transmits.

Another complication with non-blocking HCI commands is that they require
the host to jump through a lot of hoops to provide feedback to the
application.  Since all the work is done in parallel by the host task,
the host has to notify the application of failures by executing
callbacks configured by the application.  I did not want to place any
restrictions on what the application is allowed to do during these
callbacks, which means the host has to ensure that it is in a valid
state whenever a callback gets executed (no mutexes are locked, for
example).  This requires the code to use a large number of mutexes and
temporary copies of host data structures, resulting in a lot of
complicated code.

Finally, non-blocking HCI operations complicates the API presented to
the application.  A single return code from a blocking operation is
easier to manage than a return code plus the possibility of a callback
being executed sometime in the future from a different task.  A blocking
operation collapses several failure scenarios into a single function
return.

Making HCI command/acknowledgement exchanges blocking addresses all of
the above issues:
    * FSM machinery goes away; controller response is indicated in the
      return code of the HCI send function.
    * Nearly all HCI failures are indicated to the application
      immediately, so there is no need for lots of mutexes and temporary
      copies of data structures.
    * API is simplified; operation results are indicated via a simple
      function return code.

2. The Nimble host is "taskless"

Currently the Nimble host runs in its own OS task.  This is not
necessarily a bad thing, but in the case of the host, I think the costs
outweigh the benefits.  I can think of three benefits to running a
library in its own task:
    * Guarantee that timing requirements are met; just configure the
      task with an appropriate priority.
    * (related to the above point) The library task can continue to work
      while the application task is blocked.
    * Facilitates stack sizing. Since the library performs its
      operations in its own stack, it is easier to predict stack usage
      of both the library task and the application task.

I don't think any of these benefits are very compelling in the case of
the Nimble host for the following reasons:
    * The host has nothing resembling real-time timing requirements.
      There should be absolutely no problem with running the host task
      at the lowest priority, unless the hardware is simply
      overburdened, in which case there is no way to avoid issues no
      matter what you do.

    * The host code makes heavy use of application callbacks, making it
      quite difficult to estimate stack usage.  Since the host stack
      requirements depend on what the application does during these
      callbacks, the application would need to specify the host stack
      size during initialization anyway.

My proposal is to turn the Nimble host into a "flat" library that runs
in an application task.  When the application initializes the host, it
indicates which OS event queue should be used for host-related events.
Host operations would be captured in OS_EVENT_TIMER events that the
application task would need to handle generically, as it most likely
already does.  Note that these events would not be produced by an actual
timer; the events would be placed on the event queue immediately.  The
OS_EVENT_TIMER event type would just be used because it provides a basic
callback structure.

I should note also that it is fairly trivial for an application to turn
such a flat library into its own task if that is desired.  The
application developer would just need to create a simple task that
handles the OS_EVENT_TIMER events.

I think these two changes will have the following implications:
    1. Simpler API.
    2. Less RAM usage (no more FSM state, no parallel stacks).
    3. More RAM usage (larger stack).
    4. Major reduction in code size (I estimate a total size of 35 kB).

Hopefully points 2 and 3 will cancel each other out.

Thanks for reading,
Chris

Reply via email to