Hey guys, I am currently trying to build my own NameFinderME model. The corpus (from LCC) has around 20K sentences and every sentence contains at least one (one-term) forename. I have got a name dictionary with names that occur in the corpus and split it up into an 80% and a 20% set. After that I tagged the sentences with a dictionary tagger while using the 80% set and trained the NameFinderME model based on the tagged sentences.
Everything works fine, but the results are not the very best. When I am using the created model on the same (untagged) corpus, the model finds ~99% of the forenames that I used for training the model, but unfortunately it doesn't find many new entities or entities from the 20% set. Some information: Sentences in corpus: ~20K Different names occurring in corpus: ~2400 Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names for training) Names in 20% set: ~480 Overall found names with own model: ~1100 99% of the 980 names used for training are found, 20% of the "cut off" names are found and <1% names from the test set are found. Perhaps, you can give me some information how to increase the percentage of newly found entities or maybe I got something wrong. Cheers, Julian
