Aurelien's problem w the new alsa driver has forced me to try 
understand not only the buffering which takes place in  the 
alsa driver, but the complete chain from network => sound. 
Below me thoughts. I'm sending this to the list partly for the 
records, but I would of course appreciate if anybody had the 
time to read the thing and comment.  
I'm still confused, but now on a higher level ;-) 

For the impatient, there is a "Conclusion" in the end.

Eventually, something like this doc might find it's way to 
some source dir or the wiki?

Handling of playback seem to be the critical thing w respect 
to sound. Recording isn't really hard, take the samples
and send them through network. But playback *is* complicated, 
and all the problems we have had and still have with sound are 
related to this.



Output buffering
================


-----------                 -----------               
-----------
| network |                 |  jitter |->- audio-->-- | alsa 
hw |->- sound
|         |-->-- phapi -->--|  buffer |    driver     | 
buffer  |    card 
-----------                 -----------               
-----------

The network delivers packets, which phapi stores in it's 
jitter buffer.

The audio driver fetches packets from phapi:s jitter buffer 
and stores it
in the alsa driver's hw buffer. From that point, the alsa 
driver takes care
of moving the data to the soundcard which eventually produces 
the sound.


The tradeoffs.
==============

All buffering introduces delays, a k a latency. For voip 
applications
the general idea is to minimize this latency to something like 
50-150 ms.
This is an overall constraint on all buffering.

The audio driver should ideally move one data packet  each 20 
ms. Since
we cant use static priorities CPU load and scheduling will 
prevent the
audio driver from doing it's task with precise  20ms 
intervals. The role of 
the hw buffer is to buffer enough data to be able to play the 
stream despite 
these scheduling delays.

If the hw buffer is to small there will be an underrun i. e., 
no data
is available in the very moment it should be played. So simply 
stated it 
should be as small as possible, but big enough to avoid 
underruns.
A jitter buffer of 40-60 ms seems to be an accepted best 
practice. 

The role of the jitter buffer is to buffer enough data to 
smoothen out
the random delays and even ordering of data introduced by the 
network.

If the jitter buffer is to small, no data will be available 
when the 
audio driver tries to fetch next packet. How to handle this 
situation
is the audio drivers task, but it has a negative impact on 
sound quality
Generally speaking, the jitter buffer should be as small as 
possible,
but big enough to achieve a descent sound quality.

Todays standard jitter buffer setting is 60 ms. This seems to 
work on 
some networks, whereas others seems to require more.

A sidenote: Streaming audio players don't care about latency 
and uses
jitter buffers of several seconds to create a really good 
sound...

Current driver
==============

This driver does not set the alsa hw buffer size explicitly. 
The result 
is a buffer size based on alsa defaults,often a setup for 
streaming 
players. In my case the hw buffer is about 500 ms.

When the jitter buffer is empty, the driver just makes an 
empty write to
the hw buffer. This means that the large hw buffer is combined 
with the 
phapi jitter buffer to a very large buffer, capable of 
handling large
network delays but also introducing a large system latency. 
Also, if/when
the jitter buffer becomes empty despite the large buffer, 
there is no
logic to rebuffer.

Unfortunately, current driver has no counters indicating the 
quality
of the stream.

New driver.
===========
The new driver explicitly sets the size of the hw buffer to 60 
ms.

When the jitter buffer is empty, the driver immediately makes 
a 
decision to resend previous package. This means the the system 
doesn't
use the hw buffers capacity for jitter mgmt, it relies solely 
on
the phapi jitter buffer. This creates a much better latency 
120 ms,
but the limited jitter buffer seems to fail on congested
networks. (Aurelien...)

This driver has counters for e. g. underruns and rebuffered 
data.

What to do?
===========

There seem to be a basic strategic choice: to use today's 
system 
with separate jitter and hw buffers, or use the alsa hw buffer 
as
a combined jitter and hw buffer. 

To use the combined buffer would create a system with a small 
latency (the need for an extra 40-60 ms hw buffer would 
disappear.)
This is the way twinkle uses. 

Another solution is to increase the phapi jitter buffer. This 
is likely  to work, but to the price of an quite large 
overall 
delay. A 120 ms  jitter combined with a 60 ms hw buffer is 
not 
that nice, but still better than the current driver.

I don't really know how ekiga and skype works, but they both 
uses small hw buffers, which means that they are using 
separate
jitter buffers.

In the long run the jitter buffer really ought to be 
adjusted to the network conditions. Handling a bad network 
*requires* buffering, but using this buffering also at a good 
network creates a worse experience there than necessary.

So there really isn't any 'one size fits all' for the jitter
buffer. Ideally, the jitter buffer should be adapted to the
network during the call. This rules out using the combined
hw and jitter buffer, since this buffer cantt be changed
without restarting the device (and we don't want that).

So I think we should stick with a separate hw and jitter 
buffer.

The size of the hw buffer isn't really a problem, although it
might be trimmed down to 40 ms.

OTOH, the size of the jitter buffer *is* a problem. We are not
likely to find one value which fits the needs for all users
or, more precise, all calls - the same user will face very
different needs when making an international call or on the 
same LAN.

We need more experience to judge this. To get this experience
a way to configure the jitter buffer is needed. However,
in the long run users shouldn't have to configure the jitter 
buffer, this isn't really something a user should 
have to be concerned about.

Conclusion
==========

The strategy would be to keep the separate hw and jitter
buffer of today, and eventually add logic to dynamically
adapt the jitter buffer size to the network dealys.

In short term, this would mean:

- To keep the separate hw and jitter buffer of today.

- Verify that increasing the jitter buffer resolves
  Aurelien's problems.

- To trim down the hw buffer as possible.

- To make the jitter buffer size configurable, the
  first dirty approach would be an environment variable.

- To use the sw like this, gaining experience about the 
  required sizes for the jitter buffer.

- Make a decision on either a manual or automatic way
  to dynamically adjust the jitter buffer to the needs.

- Implement the dynamic adjustment.


_______________________________________________
Wengophone-devel mailing list
[email protected]
http://dev.openwengo.com/mailman/listinfo/wengophone-devel

Reply via email to