Hey Rahil, I can see what's wrong with your input data. You need spaces around the tags.
RIght meow, you have: some text <START:Entity>blah<END>some text Instead, you need: some text <START:Entity> blah <END> some text On Thu, Nov 28, 2013 at 6:47 AM, Jörn Kottmann <[email protected]> wrote: > On 11/28/2013 02:59 PM, Rahil Bohra wrote: > >> Hey Everyone. >> >> I am trying to train the opennlp name finder, here is the structure of my >> training data: >> >> Upon hearing of <START:author>Italo Calvino<END>’s death in September of >> 1985, <START:author>John Updike<END> commented, >> “<START:author>Calvino<END> >> was a genial as well as brilliant writer. >> >> What is the nature of your dreams? Are you more interested in Jung than >> you >> are in Freud? >> >> Once after reading <START:author>Freud<END>’s <START:title>The >> Interpretation of Dreams<END> I went to bed. >> >> I dreamt. >> >> Unfortunately, when I run the trainer with "opennlp TokenNameFinderTrainer >> -lang en -encoding UTF-8 -data en-author-person.train -model >> en-author-person.bin", the output is as follows; >> >> Indexing events using cutoff of 5 >> >> Computing event counts... done. 27904 events >> Indexing... done. >> Sorting and merging events... done. Reduced 27904 events to 26448. >> Done indexing. >> Incorporating indexed data for training... >> done. >> Number of Event Tokens: 26448 >> Number of Outcomes: 1 >> Number of Predicates: 7748 >> ...done. >> Computing model parameters ... >> Performing 100 iterations. >> 1: ... loglikelihood=0.0 1.0 >> 2: ... loglikelihood=0.0 1.0 >> Exception in thread "main" java.lang.IllegalArgumentException: Model not >> compatible with name finder! >> >> What am I doing wrong? I read that I need spaces between the token and the >> tag, but when these were added, the output is the same. >> > > > OpenNLP doesn't fail nicely if there are fundamental issues with the > training data. > What is wrong in your case? > > This outputline > > "Number of Outcomes: 1" > > usually indicates that you don't have a single name annotation in your > training data. The trained classification > model has only one class. The name finder model has a check which fails, > because that is not a valid model. > > We should open a jira and fix this so, the name finder trainer fails > nicely with an exception which indicates > the actual problem. > > Jörn > > >
