Hello! I play with OpenNLP NER a little bit but my results are not satisfying. I'm trying to train Polish model for finding persons entities.
Apart of famous people names, it finds names of some organizations, geographical names and others. What have I done is: - I crawled 10k articles texts from the most popular Polish daily - prepared list of 18k famous people - stemmed articles texts (Morfologik - Polish stemmer) - tagged sentences containing famous people names by <START:person> … <END> - put tagged sentences into a file (7k lines) I used prepared corpora in OpenNLP training tool and produce model (*.bin). Could you suggest me what have I missed or what can I do better in my input text file to improve my entity recognition? Thanks, Tomek
