Here is a free lexical resource:
https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/

On Thu, Jul 12, 2012 at 3:30 PM, John Stewart <[email protected]> wrote:
> Syllable segmentation in English is notoriously difficult.  I'd use a
> lexical resource rather than trying to do it algorithmically.
>
> jds
>
> On Thu, Jul 12, 2012 at 5:39 PM, Lance Norskog <[email protected]> wrote:
>> Phonetic encoding might help. This essentially creates a canonical
>> stream of consonants from a word. Check out the Double Metaphone
>> implementation in Lucene. Once you have your word encoded in
>> consonants, you can try making bigrams of the consonants.
>>
>>
>>
>> On Thu, Jul 12, 2012 at 11:49 AM, Adam Goodkind <[email protected]> wrote:
>>> It would be for typed English.
>>>
>>> On Tue, Jul 10, 2012 at 11:25 PM, Lance Norskog <[email protected]> wrote:
>>>
>>>> Is this in the general case or for specific speech? For example, it
>>>> should be possible to create an HMM that breaks medical jargon, based
>>>> on work in splitting Simplified Chinese language text. The average
>>>> Simplified Chinese "word" is 1.5 ideograms, and you need a
>>>> well-trained HMM (or similar) to split Simplified Chinese well. The
>>>> language is very context-specific with both prefixes and suffixes that
>>>> alter the meaning of "interior" words.
>>>>
>>>> On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote:
>>>> > That's right, better use a lexical database.  CELEX2, available fairly
>>>> > inexpensively from the Linguistic Data Consortium, has syllable
>>>> > boundaries in its phonological representations.
>>>> >
>>>> > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview
>>>> >
>>>> > jds
>>>> >
>>>> > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]>
>>>> wrote:
>>>> >> Adam,
>>>> >>
>>>> >> Sorry, OpenNLP doesn't detect syllables.  What you probably need is more
>>>> >> of a dictionary with pronunciation syllables.
>>>> >> It could be trained to do it maybe; but, would be very language specific
>>>> >> and not very useful.  The dictionary approach would be best.  Though
>>>> >> OpenNLP could help parse the words/tokens for you to use in the
>>>> dictionary.
>>>> >>
>>>> >> James
>>>> >>
>>>> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote:
>>>> >>> Hi all,
>>>> >>>
>>>> >>> Does OpenNLP have the ability to detect syllables? If not, could you
>>>> point
>>>> >>> me to a java toolkit that can do this?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Adam
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> [email protected]
>>>>
>>>
>>>
>>>
>>> --
>>> *Adam Goodkind *
>>> *w*  adamgoodkind.com <http://www.adamgoodkind.com>
>>> *t*   @adamgreatkind <https://twitter.com/#%21/adamgreatkind>
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]



-- 
Lance Norskog
[email protected]

Reply via email to