Hello,

SEMS' audio engine had undergone a change and simplification quite some time ago, but I feel that with new requirements coming up (precoded, stereo, wideband, connecting to other audio sources than file/conference etc) the current model is not enough any more.

I created the file
https://svn.berlios.de/svnroot/repos/sems/branches/wb/doc/media_processing_flow.txt
to show how the current processing works. Basically, Audio is pushed from an AmAudio 'device' that is input to one that is output by reading from the input device in its format and converting it to SEMS' internal format (PCM16 mono, SYSTEM_SAMPLERATE which usually was 8khz) in AmAudio::get, and then converting it to the format of the output device and writing in AmAudio::put. This happens e.g. when from the AmSession::output is read and written to AmSession::rtp_str (AmRTPAudio). The other way around, rtp_str->input, it's the same, with the difference that decoding is done before buffering (because buffering uses plc and voice processing), so AmAudio::get just reads from the buffer. For most combinations of AmAudio devices in an application (playlist, echo, conference connector, filter, mixing etc), this model works fine because they usually all operate on SEMS' internal format, so no conversion is necessary there.

Now with the current wideband implementation, this is suboptimal: If e.g. a NB rtp stream is recorded to NB file, it must be upsampled to WB, and then downsampled to NB again (if SYSTEM_SAMPLERATE is WB).

There is other related problems; e.g. with precoded RTP or relaying. All of this could be fixed with some hack, or is not really to urgent, as 99.9% of the application is still plain old 8khz mono, and processing power for this is pretty cheap. Nevertheless I would like to look a bit to the future.

I see two possibilities how the audio processing system could be improved:

1) in a conversion stage from one AmAudio to another, give the processing functions the target format (the format of the sink). The processing functions then need to check whether conversion makes sense, and somehow the format of the current samples buffer must be saved explicitly (ATM it is implicit - at every stage of processing audio is assumed to be the correct format).

2) make the audio interface more generic, and create an audio processing graph from simpler components (resampler, stereo2mono, encoder, decoder). By default the whole chain would be constructed (e.g. for sending read(AmAudio::output) -> decode(AmAudio::fmt) -> resample(RTPStream::fmt) -> encode(RTPStream::fmt) -> write(RTPStream)), but applications are free to construct their own combinations. E.g. an application using precoded audio would use read(AmAudio::output) -> write(RTPStream). The format of the audio would still be implicit - if the application constructs a wrong graph, then the audio is corrupted.

I think the second is more like the gstreamer




--
Stefan Sayer
VoIP Services

[email protected]
www.iptego.com

IPTEGO GmbH
Wittenbergplatz 1
10789 Berlin
Germany

Amtsgericht Charlottenburg, HRB 101010
Geschaeftsfuehrer: Alexander Hoffmann
_______________________________________________
Semsdev mailing list
[email protected]
http://lists.iptel.org/mailman/listinfo/semsdev

Reply via email to