Howdy,

As some folks may have noticed we're working on a voice input feature in
HUD.  Part of what that requires is acoustic models to be available to
understand the speech coming in.  Currently in Ubuntu there are a couple
of these, but we need to get to the point of providing for various
languages and having a way to update these continuously as the data gets
better.

So that leads to the question: How do we want these to look in Ubuntu?

The best open source for training data appears to be Voxforge, a
collection of samples based on known text.  These samples can then be
used to compile the acoustical model that the various libraries need.
This takes significant amounts of CPU time.  Their most complete
language is English, which has about 100 hours of audio, and takes about
10 CPU hours to compile the models that Sphinx needs.  While English is
the most complete, I think it's important to realize that the best/worst
case scenario that supports all languages well could result in easily
over a thousand hours of CPU time.

So if we think of things in the classic source vs. binary split, it
seems like the Voxforge data is the source and we should make a source
package that then builds these binary models.  But, at some level, we're
just exchanging binary data (sound files) for different binary files
(acoustic models).  Would it make more sense to package something like
the Voxforge nightly builds for use in Ubuntu?

I'd love to hear people's thoughts on this.  I'm leaning towards putting
the Voxforge data as a source package, as it is our source, but I'm
worried about the impact it may have on rebuilding the archive.

Thanks,
Ted

Attachment: signature.asc
Description: This is a digitally signed message part

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Reply via email to