-------- Original Message --------
Subject:        Re: Some questions about Dictionary and DictionaryNameFinder
Date:   Thu, 23 Feb 2012 23:00:29 -0500
From:   James Kosin <[email protected]>
To:     [email protected]



On 2/23/2012 1:56 PM, Jim - FooBar(); wrote:
> On 23/02/12 15:50, Jörn Kottmann wrote:
>> On 02/23/2012 04:41 PM, Jim wrote:
>>> If you could help me sort out the dictionary problem it would be
>>> amazing!!!
>>
>> +1 to work on better gazetteer support! There are many people
>> who need a feature like this.
>>
>> The stuff we have right now in OpenNLP was created to have a simple
>> lookup dictionary for feature generation. I am sure there is lot of room
>> to improve it.
>>
>> Jörn
>
> I'm not sure i understood...Can the Dictionary find multi-word
> entities? I would say it can't simply because it expects tokenized
> text and then it tries to find an exact match. So an entry like "Folic
> acid" is being split into 2 tokens just before the Dictionary sees
> it...so it will try to first  match "folic" and  "acid" separately!
>
> Any ideas? Apparently someone else had the same problem and solved
> it!!! I'd donate my left hand to know how!!!!
>
> Thanks,
> Jim
Jim,

Maybe the problem is how you have created the dictionary.  The
DictionaryNameFinder's find() method is a greedy method that will match
as many tokens as possible.
If it isn't matching more than one token than that is probably all the
dictionary contains per entry.

Look at the simple example in the test packages for
opennlp.tools.namefind DictionaryNameFinderTest.java in the source packages.

There has a good example.

James

Reply via email to