Hello,

I am not entirely sure but I think the English NER models were trained
on MUC 7 data. Note that supervised learning approaches to NLP in
general work suffer the "domain adaptation problem". Basically that
means that you are deploying a model learned from some specific type
of data to other type of data which is quite different. Performance
degrades as a result.

To improve your results the best is to train your own model (need
annotated data for that). If you do not have annotated data from your
own domain, you can use a newer dataset such as Ontonotes and train
your model with that data.

Optionally, if you have a type of locations which happen fairly
regularly, you  can also try to use the DictionaryNameFinder to use
lists of locations and the RegexNameFinder to create rules using
regular expressions for location finding.

HTH,

Rodrigo

On Sun, Nov 1, 2015 at 6:15 AM, Madhav Sharan <[email protected]> wrote:
> Hello opennlp users,
>
> I am facing some issue while extracting locations from file contents. Using
> en-ner-location.bin I am able to extract location if it's provided in
> camelcase but not if otherwise.
>
> *For example :*
>   - I can extract "China" out of - "A geographically distributed network of
> *China*"
>   - But not from - "A geographically distributed network of *china*"
>
> I already tried converting whole text to camel case but it makes matter
> worse, so instead of trying more solution based on my intuitions would be
> best for me if I can get help on below two questions:
>
> Can someone suggest an enhancement?
> Can someone help me know how en location name finder model is trained?
> Location name finder model.en-ner-location.bin
> <http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin>
> *What are we trying to do?*
> We are building an opensource tool to extract location out of any file and
> then visualize it on a map. These file will mostly coming from web content
> but can be anything a user wish.
>
> --
> Thanks
> Madhav Sharan

Reply via email to