Phonetic encoding might help. This essentially creates a canonical
stream of consonants from a word. Check out the Double Metaphone
implementation in Lucene. Once you have your word encoded in
consonants, you can try making bigrams of the consonants.



On Thu, Jul 12, 2012 at 11:49 AM, Adam Goodkind <[email protected]> wrote:
> It would be for typed English.
>
> On Tue, Jul 10, 2012 at 11:25 PM, Lance Norskog <[email protected]> wrote:
>
>> Is this in the general case or for specific speech? For example, it
>> should be possible to create an HMM that breaks medical jargon, based
>> on work in splitting Simplified Chinese language text. The average
>> Simplified Chinese "word" is 1.5 ideograms, and you need a
>> well-trained HMM (or similar) to split Simplified Chinese well. The
>> language is very context-specific with both prefixes and suffixes that
>> alter the meaning of "interior" words.
>>
>> On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote:
>> > That's right, better use a lexical database.  CELEX2, available fairly
>> > inexpensively from the Linguistic Data Consortium, has syllable
>> > boundaries in its phonological representations.
>> >
>> > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview
>> >
>> > jds
>> >
>> > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]>
>> wrote:
>> >> Adam,
>> >>
>> >> Sorry, OpenNLP doesn't detect syllables.  What you probably need is more
>> >> of a dictionary with pronunciation syllables.
>> >> It could be trained to do it maybe; but, would be very language specific
>> >> and not very useful.  The dictionary approach would be best.  Though
>> >> OpenNLP could help parse the words/tokens for you to use in the
>> dictionary.
>> >>
>> >> James
>> >>
>> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote:
>> >>> Hi all,
>> >>>
>> >>> Does OpenNLP have the ability to detect syllables? If not, could you
>> point
>> >>> me to a java toolkit that can do this?
>> >>>
>> >>> Thanks,
>> >>> Adam
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>
>
>
> --
> *Adam Goodkind *
> *w*  adamgoodkind.com <http://www.adamgoodkind.com>
> *t*   @adamgreatkind <https://twitter.com/#%21/adamgreatkind>



-- 
Lance Norskog
[email protected]

Reply via email to