Thanks for that Johan - I hadn't thought about that aspect. All theoretic
at the moment, but IBM Voice Gateway, at least, does claim to be able to
handle it using SIPREC - so maybe they are confident about their ability to
differentiate between caller and callee in a single stream?...
"The voice g
The issue with siprec (based on rtpproxy) is that you have only 1 stream
containing the voice from caller to callee and callee to caller. So that
will give a hard time on the ASR :-). I do know that rtpengine has
something similar to siprec but I don't know the details.
Bottom line, in my opinio
I'm just starting to look at Speech-to-Text (STT) processing for calls -
initially recordings but moving on to real-time. I would see this working
along the lines of either:
- a call is recorded, and when the call ends an event is triggered to
initiate transcription of the recording
- a call start