On 5/19/21 5:48 AM, Richard Owlett wrote: > On 05/16/2021 01:00 PM, Aaron wrote: >> On 5/16/21 8:19 AM, Richard Owlett wrote: > > [I'm subscribed to the list ;] > >> >>> I notice PocketSphinx in the Debian repositories. >>> How suitable is it for dictation by a single speaker? >>> I realize it is designed to be speaker independent. >>> TIA >>> >> I wouldn't say it is designed to be speaker independent. > > When I read the description I may have "seen" what I wanted to see. > I haven't investigated speech recognition since I was using Windows a > decade ago. > > I'm assuming training to my voice and speaking style. I want > continuous speech and as large a vocabulary as possible. > Thank you for getting in touch. I feel like I have a somewhat better idea of what you are trying to do.
Kaldi, Deepspeech and FlashlightASR all recommend Linux for your environment. I'm not sure if any of them can run on Windows or OSX. Pocketsphinx is definitely not going to work for the purposes of taking dictation for letters. The easiest way to get speech recognition is going to be to use an online service like Google Cloud TTS. This has the full power of the Google search engine behind it as far as language model, and they handle all the optimizations on their side automatically. I think there is still a free version of this service. The main reason to avoid it is, of course, privacy, and second being that it requires an internet connection. I only mention it because it is so much easier to get set up right now and you didn't explicitly state what your requirements are. Kaldi, Mozilla Deepspeech, and FlashlightASR are all viable options. They are all free and open source, run locally, and interface well with Python as far as scripting the training and recognition processes (they also interface with c++ but I'd at least prototype stuff in python first). Kaldi and FlashlightASR are currently aimed at researchers, so they are not easy to set up and the documentation is full of intimidating formulas and technical jargon. Mozilla Deepspeech is somewhat gentler to work with and seems to have better support, plus it can be installed with a simple "pip install deepspeech". Mozilla Deepspeech and FlashlightASR both use KenLM language models by default, while Kaldi uses a variety of language models. I'm currently working on a research project where I am trying to compare the current state of different speech recognition engines and classify them according to strengths and weaknesses. If I can be helpful, please let me know.
OpenPGP_signature
Description: OpenPGP digital signature