-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 satya komaragiri wrote: > I can showcase one of its potential usages by integrating speech > capabilities to the 'Listen and Spell' activity where the child can > spell out the word verbally. I want to let the children speak out the > spelling rather than type it out. As the alphabet of any language is > limited (26 in the case of English, extension to any language would > just mean getting a few people to read out the alphabet of that > language).
I think this is a very good idea. General speech recognition is error-prone and computationally intensive, but recognizing letter-names is a much easier problem. It also fits very well with our emphasis on young children who may still be learning to spell. I must admit that I cannot say exactly what this is "useful" for. In my experience, there is at most a very narrow age window in which children can spell, but not type. It would certainly be a very nifty demo, and might help us to "engage" users. Your proposal will have a better chance of being accepted if you can give a compelling use case example. > Having a generic library will make system-wide integration easier by > abstracting the interactions with the speech engine via DBUS etc. All > the activities can use the speech capability as they see fit (spoken > commands to control the activity is the most straightforward > application that strikes me). My advice is to focus on letters, not commands. Letters are universal, and can be applied in any application that involves typing. Commands would have to be different for every activity, requiring endless new training data. (You could, however, include words like "Control" and "Shift", which would allow users to access commands by speaking the shortcuts. Commands to Sugar itself, like "Frame" or "Neighborhood view" would also be appropriate.) I suggest you work in two stages: 1. Take an existing activity (perhaps Listen and Spell) and add the ability to do voice recognition of letters. This does not require any DBUS magic. In this way, you can show that your speech recognition is actually working (preferably on an XO, which can be provided to you). 2. Convert this into a system service that listens to the microphone and synthesizes keystrokes via XInput. (The effect, then, is just as if the letter had been typed on the keyboard.) Add a device to the Sugar frame to activate and deactivate this service. (This frame device might also have to mute the speakers, to avoid interference.) Note that, apart from the switch in the Sugar frame, this system would be applicable to any Linux (or even Linux-like) desktop. Extra credit: 3. Provide an interface for users to record a new set of voices for their own alphabet and language. - --Ben -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAknKXGEACgkQUJT6e6HFtqTnPwCeOCa5PoFoNlpRdQ/lTl2x9CDn tY8AnjMCOXjWXZsSfHZwLGpMn32gVk/n =rbua -----END PGP SIGNATURE----- _______________________________________________ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel