Chris Hayes wrote:
> Hi - I was wondering whether anyone here might know about what voice 
> recognition software is currently available for Linux.

(warning, I am an unrepentant curmudgeon and negative filter.  Interpret 
the following accordingly.  If I'm wrong on any points, and someone 
wants to correct me, I will gladly learn.)

In a nutshell, not much.  Sphinx 4, and others of its family, you have 
some fairly decent recognition systems.  However, they are not ready for 
prime time because if they were, people would be using them for desktop 
recognition.  while the recognition engines may work well, a lot of the 
ancillary pieces such as training, dealing with microphone switching, 
dictionary management etc. are not quite there yet.  On the other hand, 
the same shortcomings can be laid at the feet of Linux and Windows audio 
subsystems.

from my perspective, the only usable speech recognition for end users is 
naturally speaking.  There may be something on a Macintosh but I don't 
have any experience there.  The reason I say NaturallySpeaking is the 
only usable one is because it's a large vocabulary continuous speech 
recognition system people used to get work done.  Recognition engine, 
language model, sound system interface, etc. etc.. have had many years 
to evolve.  nuance has had a couple of years to screw it up and they've 
done a wonderful job at it.  I think the only positive contribution they 
have made during their stewardship of the product is the addition of a 
Bluetooth microphone audio model.

The only way to get good speech recognition on Linux is for someone to 
drop a small number of millions of dollars into nuance's lap and pray. 
Not a good solution.

I've been thinking about an alternative model for a couple of years in 
between other projects but I do believe the best solution (best defined 
as getting handicapped people working), would be to make use of Windows 
and Linux via virtual machines.  Since virtual machines do horrible 
things to sound systems, I would recommend using Windows as a host OS 
with speech recognition, a mediator to transfer 
characters/commands/keystrokes to the Linux environment and a mediator 
to return window state information such as screen content, application 
running etc. etc.)

There has been a primitive instance (which this has been taken off the 
net) to show the technique is fundamentally sound.  a full function 
mediator, while difficult, is a couple orders of magnitude or more 
easier to build than moving a large and complicated windows application 
to Linux.

in the short-term, run Linux on a virtual machine,  display apps via X11 
server, and use something like natpython and one of its macro packages 
to build commands for Linux applications.  nattext still bite you in the 
ass  with all the random characters and inserts in applications but, 
that's nuances contribution.

---eric

-- 
Speech-recognition in use.  It makes mistakes, I correct some.

-- 
Ubuntu-accessibility mailing list
Ubuntu-accessibility@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-accessibility

Reply via email to