Syllable segmentation in English is notoriously difficult. I'd use a lexical resource rather than trying to do it algorithmically.
jds On Thu, Jul 12, 2012 at 5:39 PM, Lance Norskog <[email protected]> wrote: > Phonetic encoding might help. This essentially creates a canonical > stream of consonants from a word. Check out the Double Metaphone > implementation in Lucene. Once you have your word encoded in > consonants, you can try making bigrams of the consonants. > > > > On Thu, Jul 12, 2012 at 11:49 AM, Adam Goodkind <[email protected]> wrote: >> It would be for typed English. >> >> On Tue, Jul 10, 2012 at 11:25 PM, Lance Norskog <[email protected]> wrote: >> >>> Is this in the general case or for specific speech? For example, it >>> should be possible to create an HMM that breaks medical jargon, based >>> on work in splitting Simplified Chinese language text. The average >>> Simplified Chinese "word" is 1.5 ideograms, and you need a >>> well-trained HMM (or similar) to split Simplified Chinese well. The >>> language is very context-specific with both prefixes and suffixes that >>> alter the meaning of "interior" words. >>> >>> On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote: >>> > That's right, better use a lexical database. CELEX2, available fairly >>> > inexpensively from the Linguistic Data Consortium, has syllable >>> > boundaries in its phonological representations. >>> > >>> > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview >>> > >>> > jds >>> > >>> > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]> >>> wrote: >>> >> Adam, >>> >> >>> >> Sorry, OpenNLP doesn't detect syllables. What you probably need is more >>> >> of a dictionary with pronunciation syllables. >>> >> It could be trained to do it maybe; but, would be very language specific >>> >> and not very useful. The dictionary approach would be best. Though >>> >> OpenNLP could help parse the words/tokens for you to use in the >>> dictionary. >>> >> >>> >> James >>> >> >>> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote: >>> >>> Hi all, >>> >>> >>> >>> Does OpenNLP have the ability to detect syllables? If not, could you >>> point >>> >>> me to a java toolkit that can do this? >>> >>> >>> >>> Thanks, >>> >>> Adam >>> >>> >>> >> >>> >> >>> >>> >>> >>> -- >>> Lance Norskog >>> [email protected] >>> >> >> >> >> -- >> *Adam Goodkind * >> *w* adamgoodkind.com <http://www.adamgoodkind.com> >> *t* @adamgreatkind <https://twitter.com/#%21/adamgreatkind> > > > > -- > Lance Norskog > [email protected]
