Aurelien's problem w the new alsa driver has forced me to try understand not only the buffering which takes place in the alsa driver, but the complete chain from network => sound. Below me thoughts. I'm sending this to the list partly for the records, but I would of course appreciate if anybody had the time to read the thing and comment. I'm still confused, but now on a higher level ;-)
For the impatient, there is a "Conclusion" in the end. Eventually, something like this doc might find it's way to some source dir or the wiki? Handling of playback seem to be the critical thing w respect to sound. Recording isn't really hard, take the samples and send them through network. But playback *is* complicated, and all the problems we have had and still have with sound are related to this. Output buffering ================ ----------- ----------- ----------- | network | | jitter |->- audio-->-- | alsa hw |->- sound | |-->-- phapi -->--| buffer | driver | buffer | card ----------- ----------- ----------- The network delivers packets, which phapi stores in it's jitter buffer. The audio driver fetches packets from phapi:s jitter buffer and stores it in the alsa driver's hw buffer. From that point, the alsa driver takes care of moving the data to the soundcard which eventually produces the sound. The tradeoffs. ============== All buffering introduces delays, a k a latency. For voip applications the general idea is to minimize this latency to something like 50-150 ms. This is an overall constraint on all buffering. The audio driver should ideally move one data packet each 20 ms. Since we cant use static priorities CPU load and scheduling will prevent the audio driver from doing it's task with precise 20ms intervals. The role of the hw buffer is to buffer enough data to be able to play the stream despite these scheduling delays. If the hw buffer is to small there will be an underrun i. e., no data is available in the very moment it should be played. So simply stated it should be as small as possible, but big enough to avoid underruns. A jitter buffer of 40-60 ms seems to be an accepted best practice. The role of the jitter buffer is to buffer enough data to smoothen out the random delays and even ordering of data introduced by the network. If the jitter buffer is to small, no data will be available when the audio driver tries to fetch next packet. How to handle this situation is the audio drivers task, but it has a negative impact on sound quality Generally speaking, the jitter buffer should be as small as possible, but big enough to achieve a descent sound quality. Todays standard jitter buffer setting is 60 ms. This seems to work on some networks, whereas others seems to require more. A sidenote: Streaming audio players don't care about latency and uses jitter buffers of several seconds to create a really good sound... Current driver ============== This driver does not set the alsa hw buffer size explicitly. The result is a buffer size based on alsa defaults,often a setup for streaming players. In my case the hw buffer is about 500 ms. When the jitter buffer is empty, the driver just makes an empty write to the hw buffer. This means that the large hw buffer is combined with the phapi jitter buffer to a very large buffer, capable of handling large network delays but also introducing a large system latency. Also, if/when the jitter buffer becomes empty despite the large buffer, there is no logic to rebuffer. Unfortunately, current driver has no counters indicating the quality of the stream. New driver. =========== The new driver explicitly sets the size of the hw buffer to 60 ms. When the jitter buffer is empty, the driver immediately makes a decision to resend previous package. This means the the system doesn't use the hw buffers capacity for jitter mgmt, it relies solely on the phapi jitter buffer. This creates a much better latency 120 ms, but the limited jitter buffer seems to fail on congested networks. (Aurelien...) This driver has counters for e. g. underruns and rebuffered data. What to do? =========== There seem to be a basic strategic choice: to use today's system with separate jitter and hw buffers, or use the alsa hw buffer as a combined jitter and hw buffer. To use the combined buffer would create a system with a small latency (the need for an extra 40-60 ms hw buffer would disappear.) This is the way twinkle uses. Another solution is to increase the phapi jitter buffer. This is likely to work, but to the price of an quite large overall delay. A 120 ms jitter combined with a 60 ms hw buffer is not that nice, but still better than the current driver. I don't really know how ekiga and skype works, but they both uses small hw buffers, which means that they are using separate jitter buffers. In the long run the jitter buffer really ought to be adjusted to the network conditions. Handling a bad network *requires* buffering, but using this buffering also at a good network creates a worse experience there than necessary. So there really isn't any 'one size fits all' for the jitter buffer. Ideally, the jitter buffer should be adapted to the network during the call. This rules out using the combined hw and jitter buffer, since this buffer cantt be changed without restarting the device (and we don't want that). So I think we should stick with a separate hw and jitter buffer. The size of the hw buffer isn't really a problem, although it might be trimmed down to 40 ms. OTOH, the size of the jitter buffer *is* a problem. We are not likely to find one value which fits the needs for all users or, more precise, all calls - the same user will face very different needs when making an international call or on the same LAN. We need more experience to judge this. To get this experience a way to configure the jitter buffer is needed. However, in the long run users shouldn't have to configure the jitter buffer, this isn't really something a user should have to be concerned about. Conclusion ========== The strategy would be to keep the separate hw and jitter buffer of today, and eventually add logic to dynamically adapt the jitter buffer size to the network dealys. In short term, this would mean: - To keep the separate hw and jitter buffer of today. - Verify that increasing the jitter buffer resolves Aurelien's problems. - To trim down the hw buffer as possible. - To make the jitter buffer size configurable, the first dirty approach would be an environment variable. - To use the sw like this, gaining experience about the required sizes for the jitter buffer. - Make a decision on either a manual or automatic way to dynamically adjust the jitter buffer to the needs. - Implement the dynamic adjustment. _______________________________________________ Wengophone-devel mailing list [email protected] http://dev.openwengo.com/mailman/listinfo/wengophone-devel
