Hello,
SEMS' audio engine had undergone a change and simplification quite some
time ago, but I feel that with new requirements coming up (precoded,
stereo, wideband, connecting to other audio sources than file/conference
etc) the current model is not enough any more.
I created the file
https://svn.berlios.de/svnroot/repos/sems/branches/wb/doc/media_processing_flow.txt
to show how the current processing works. Basically, Audio is pushed
from an AmAudio 'device' that is input to one that is output by reading
from the input device in its format and converting it to SEMS' internal
format (PCM16 mono, SYSTEM_SAMPLERATE which usually was 8khz) in
AmAudio::get, and then converting it to the format of the output device
and writing in AmAudio::put. This happens e.g. when from the
AmSession::output is read and written to AmSession::rtp_str
(AmRTPAudio). The other way around, rtp_str->input, it's the same, with
the difference that decoding is done before buffering (because buffering
uses plc and voice processing), so AmAudio::get just reads from the
buffer. For most combinations of AmAudio devices in an application
(playlist, echo, conference connector, filter, mixing etc), this model
works fine because they usually all operate on SEMS' internal format, so
no conversion is necessary there.
Now with the current wideband implementation, this is suboptimal: If
e.g. a NB rtp stream is recorded to NB file, it must be upsampled to WB,
and then downsampled to NB again (if SYSTEM_SAMPLERATE is WB).
There is other related problems; e.g. with precoded RTP or relaying. All
of this could be fixed with some hack, or is not really to urgent, as
99.9% of the application is still plain old 8khz mono, and processing
power for this is pretty cheap. Nevertheless I would like to look a bit
to the future.
I see two possibilities how the audio processing system could be improved:
1) in a conversion stage from one AmAudio to another, give the
processing functions the target format (the format of the sink). The
processing functions then need to check whether conversion makes sense,
and somehow the format of the current samples buffer must be saved
explicitly (ATM it is implicit - at every stage of processing audio is
assumed to be the correct format).
2) make the audio interface more generic, and create an audio processing
graph from simpler components (resampler, stereo2mono, encoder,
decoder). By default the whole chain would be constructed (e.g. for
sending read(AmAudio::output) -> decode(AmAudio::fmt) ->
resample(RTPStream::fmt) -> encode(RTPStream::fmt) -> write(RTPStream)),
but applications are free to construct their own combinations. E.g. an
application using precoded audio would use read(AmAudio::output) ->
write(RTPStream). The format of the audio would still be implicit - if
the application constructs a wrong graph, then the audio is corrupted.
I think the second is more like the gstreamer
--
Stefan Sayer
VoIP Services
[email protected]
www.iptego.com
IPTEGO GmbH
Wittenbergplatz 1
10789 Berlin
Germany
Amtsgericht Charlottenburg, HRB 101010
Geschaeftsfuehrer: Alexander Hoffmann
_______________________________________________
Semsdev mailing list
[email protected]
http://lists.iptel.org/mailman/listinfo/semsdev