[Semsdev] rethinking the audio processing

Stefan Sayer Fri, 08 May 2009 07:49:16 -0700

Hello,

SEMS' audio engine had undergone a change and simplification quite sometime ago, but I feel that with new requirements coming up (precoded,stereo, wideband, connecting to other audio sources than file/conferenceetc) the current model is not enough any more.


I created the file
https://svn.berlios.de/svnroot/repos/sems/branches/wb/doc/media_processing_flow.txt

to show how the current processing works. Basically, Audio is pushedfrom an AmAudio 'device' that is input to one that is output by readingfrom the input device in its format and converting it to SEMS' internalformat (PCM16 mono, SYSTEM_SAMPLERATE which usually was 8khz) inAmAudio::get, and then converting it to the format of the output deviceand writing in AmAudio::put. This happens e.g. when from theAmSession::output is read and written to AmSession::rtp_str(AmRTPAudio). The other way around, rtp_str->input, it's the same, withthe difference that decoding is done before buffering (because bufferinguses plc and voice processing), so AmAudio::get just reads from thebuffer. For most combinations of AmAudio devices in an application(playlist, echo, conference connector, filter, mixing etc), this modelworks fine because they usually all operate on SEMS' internal format, sono conversion is necessary there.

Now with the current wideband implementation, this is suboptimal: Ife.g. a NB rtp stream is recorded to NB file, it must be upsampled to WB,and then downsampled to NB again (if SYSTEM_SAMPLERATE is WB).

There is other related problems; e.g. with precoded RTP or relaying. Allof this could be fixed with some hack, or is not really to urgent, as99.9% of the application is still plain old 8khz mono, and processingpower for this is pretty cheap. Nevertheless I would like to look a bitto the future.


I see two possibilities how the audio processing system could be improved:

1) in a conversion stage from one AmAudio to another, give theprocessing functions the target format (the format of the sink). Theprocessing functions then need to check whether conversion makes sense,and somehow the format of the current samples buffer must be savedexplicitly (ATM it is implicit - at every stage of processing audio isassumed to be the correct format).

2) make the audio interface more generic, and create an audio processinggraph from simpler components (resampler, stereo2mono, encoder,decoder). By default the whole chain would be constructed (e.g. forsending read(AmAudio::output) -> decode(AmAudio::fmt) ->resample(RTPStream::fmt) -> encode(RTPStream::fmt) -> write(RTPStream)),but applications are free to construct their own combinations. E.g. anapplication using precoded audio would use read(AmAudio::output) ->write(RTPStream). The format of the audio would still be implicit - ifthe application constructs a wrong graph, then the audio is corrupted.


I think the second is more like the gstreamer




--
Stefan Sayer
VoIP Services

[email protected]
www.iptego.com

IPTEGO GmbH
Wittenbergplatz 1
10789 Berlin
Germany

Amtsgericht Charlottenburg, HRB 101010
Geschaeftsfuehrer: Alexander Hoffmann
_______________________________________________
Semsdev mailing list
[email protected]
http://lists.iptel.org/mailman/listinfo/semsdev

[Semsdev] rethinking the audio processing

Reply via email to