Re: Dictionary name finder question

Jim - FooBar(); Sat, 17 Mar 2012 05:06:46 -0700

Hey James,

On 17/03/12 02:49, James Kosin wrote:

If you two could test the latest out, I'd appreciate.
Especially any performance issues, if possible.  I'm trying to be sure I
haven't turned this into a N^2  type problem again.  If so, I'll need to
re-open the JIRA and fix the Index class to handle case sensitivity.

Well, the dictionary name finder is indeed a bit slower now, but that isfine with me...Tested it with a relatively big test corpus and the theDictionary lookup took 2-3 seconds more than the maxent model. Now, eventhough that is counter-intuitive (you expect the iterative search to beextremely fast), i simply don't care at this point - it is not a problemfor me! The fact that is it finds multi-word tokens is the mostimportant fix with case-sensitivity coming second for me (i can alwaysuncapitalize my dictionary)...

(b)  I tried to fix the DictionaryNameFinder.... woops, I refactored
incorrectly.  Unfortunately, you two are but a few that use the
DictionaryNameFinder.

Maybe we are just a few because the DictionaryNameFinder never quiteworked as advertised...i do expect more people to start using itespecially if it can be integrated with the maxent model (from theevaluators point of view)...It is the easiest way to improve one'sresults without cheating!

Thanks for your patience, testing and posting to the list.


Don't mention it man... Thank YOU for addressing it! :-)

I can see clearly now my mistakes.

That is what this is all about! Good stuff...

The code currently in SVN trunk has sort-of a compromise until we get
the Index working again properly with case sensitivity.  It contains
code that will keep trying longer token entries as long as the current
length is less than the maximum held in the dictionary.  This allows the
DictionaryNameFinder's find() method to work; but, we have a small
performance penalty due to the way the find() method isn't caring what
words it adds to the token strings.

So, by 'performance penalty' you mean runtime performance as opposed toaccuracy performance yes?This is exactly what i confirmed above...It takes roughly twice the timefor the Dictionary to do its job , than it takes the maxentmodel...Nevertheless, what comes back are the correct, case-insensitivenamed entities so all is good (at least for me!)...

I'm going to look at possible solutions to getting the Index working
again properly with the DictionaryNameFinder... but, it will take some time.

Excellent...is there any way for me to find out whenever you fix it? Imean, will you post anything here or is there a JIRA i can start "watching"?

Also, if i manage to 'hack' the evaluator to take into account both themaxent model and dictionary findings to improve the statistics, is thatsomething you would consider adding to openNLP? There is a JIRA for itfrom last year, which i voted for and commented on...I'm not sure ifyou've seen it...


Thanks again for the patch (regardless of the compromise)...:-)

Jim

Re: Dictionary name finder question

Reply via email to