Howdy, As some folks may have noticed we're working on a voice input feature in HUD. Part of what that requires is acoustic models to be available to understand the speech coming in. Currently in Ubuntu there are a couple of these, but we need to get to the point of providing for various languages and having a way to update these continuously as the data gets better.
So that leads to the question: How do we want these to look in Ubuntu? The best open source for training data appears to be Voxforge, a collection of samples based on known text. These samples can then be used to compile the acoustical model that the various libraries need. This takes significant amounts of CPU time. Their most complete language is English, which has about 100 hours of audio, and takes about 10 CPU hours to compile the models that Sphinx needs. While English is the most complete, I think it's important to realize that the best/worst case scenario that supports all languages well could result in easily over a thousand hours of CPU time. So if we think of things in the classic source vs. binary split, it seems like the Voxforge data is the source and we should make a source package that then builds these binary models. But, at some level, we're just exchanging binary data (sound files) for different binary files (acoustic models). Would it make more sense to package something like the Voxforge nightly builds for use in Ubuntu? I'd love to hear people's thoughts on this. I'm leaning towards putting the Voxforge data as a source package, as it is our source, but I'm worried about the impact it may have on rebuilding the archive. Thanks, Ted
signature.asc
Description: This is a digitally signed message part
-- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel