Hey guys,

I am currently trying to build my own NameFinderME model. The corpus (from
LCC) has around 20K sentences and every sentence
contains at least one (one-term) forename.
I have got a name dictionary with names that occur in the corpus and split
it up into an 80% and a 20% set. After that I tagged the sentences with a
dictionary tagger while using the 80% set and trained the NameFinderME
model based on the tagged sentences.

Everything works fine, but the results are not the very best. When I am
using the created model on the same (untagged) corpus, the model finds
~99% of the forenames that I used for training the model, but
unfortunately it doesn't find many new entities or entities from the 20%
set.

Some information:
Sentences in corpus: ~20K
Different names occurring in corpus: ~2400
Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names for
training)
Names in 20% set: ~480
Overall found names with own model: ~1100

99% of the 980 names used for training are found, 20% of the "cut off"
names are
found and <1% names from the test set are found.

Perhaps, you can give me some information how to increase the percentage
of newly found entities or maybe I got something wrong.

Cheers,

Julian

Reply via email to