Re: Name Finder tool vs API

Jörn Kottmann Tue, 17 Jul 2012 02:18:23 -0700

On 07/17/2012 10:27 AM, Chi Dat Nguyen wrote:

After a while I figured out that the result provided by the pretrained
tokenizer causes this problem.
If "Mr. Vinken" is tokenized into 3 tokens "Mr", ".", "Vinken",
instead of 2 tokens, the Name Finder works perfectly.
It seems that the SimpleTokenizer is better than the pretrained
tokenizer in these cases.


Exactly, the English NER models on the sourceforge page are trained with
the SimpleTokenizer, so you need to use that to get good results.
Especially important context words like Mr. are tokenized differently
compared to the maxent based English tokenizer.

May I ask how we can use the optional parameters of
opennlp.uima.namefind.NameFinder: opennlp.uima.ProbabilityFeature,
opennlp.uima.BeamSize, opennlp.uima.DocumentConfidenceType?
I'm sorry for asking these kinds of questions. I just started to use
OpenNLP recently and there is nearly no documentation for OpenNLP UIMA
at all.


These are parameters of our UIMA integration. Do you use that?

You need to specify these parameters in the Analysis Engine descriptor

and assign an appropriate value. Beam size needs an integer, theprobability featureis the name of the feature where the prop of a name can be assigned to(aka confidence).And the DocumentConfidenceType is a type of an FS which is created tocontain the

confidence for a document.

Jörn

Re: Name Finder tool vs API

Reply via email to