There is also a n-gram feature generator that can be used with the Name Finder, you should give it a try to establish a baseline on your data and then you can still tune it and test different feature generation strategies.
Jörn On Mon, Mar 5, 2018 at 11:57 AM, Manoj B. Narayanan <[email protected]> wrote: > Hi Manjunath, > > The best way is to go with NER. > > I don't get what you mean by N-gram feature analysis. Would be helpful if > you could elaborate. > > From your example I see all are exact matches. So I suggest you go with a > Dictionary Name Finder. > > Thanks, > Manoj. > > On Mon, Mar 5, 2018 at 4:16 PM, manjunath nakshathri <[email protected]> > wrote: > >> Hello There, >> >> We are using opennlp for document categorization with Ngram Features to >> categorize our incoming text. For example : >> >> "The shape of water and Frances McDormand rule oscar 2018" >> >> Given this sentence we would like to arrive at : >> >> Shape of Water : Movie >> Frances McDormand : Actress >> >> This we are able to achieve with the following document categorization >> training data and with the ngram features; >> >> Movie Shape of Water >> Actress Frances McDormand >> >> *What is not working:* >> If we try to categorize a single word say Oscar as an award category, we >> are not able to. Any idea how we can get this working? >> >> *Target training data* >> Movie Shape of Water >> Actress Frances McDormand >> Award Oscar >> >> *Desired Output :* >> Shape of Water : Movie >> Frances McDormand : Actress >> Oscar: Award >> >> Implementation details : >> Open NLP version : 1.8.4 >> Training Algorithm used : Naive Bayes >> Iteraitions set : 100 >> >> *General Questions* >> Q :Why we cant use NER ? >> A : We need ngram feature analysis which is not possible in NER. >> >> Q : Are we going to build our own training data ? >> A : Yes >> >> Really appreciate any help towards solving this issue. >> >> -- >> Thanks and Regards >> Manjunath >> > > > > -- > Regards, > Manoj.
