Siva, I'm assuming there is nothing wrong with you code. OpenNLP's named-entity recognizer is based on MaxEnt modeling, as opposed to rule-based programming, to identify named entities. So, the answer to "Why did OpenNLP return X as an organization" is always going to be "Because it was trained to do so". If the training set--that is, the set of sentences used to train the recognition model that you are using--does not possess similar characteristics to the sentences you are using that model to process, you are going to get sub-optimal results.
It looks to me as if you are processing tweets. If you're using the default recognizer, I doubt very much whether that was trained on tweets, and tweets possess very different characteristics than regular prose. Consequently, I suggest that you consider training a model using data that represents what you want to actually process. In the examples you give, Intel is a company name in on case and a slang term (contraction of Intelligence) in another.You may find that it is not possible to train just one model to handle all cases. You might need individual strategies for different industries, depending on what you are trying to achieve. Good Luck. Regards, Jeff On Fri, Sep 20, 2013 at 2:59 AM, Siva Sakthi <[email protected]> wrote: > Can anyone answer the above question??? > > Thanks > > > On Fri, Sep 13, 2013 at 4:19 PM, Siva Sakthi <[email protected]> wrote: > > > Hi, > > we are using opennlp for finding organizations (code below) > > > > e.g. > > > > 1. Find out how Intel Xeon processors help make #EMC number 1 in backup > at > > #IDF13 going on now in San Francisco. #Speed2Lead Protect your data > > >> > > Opennlp returns "Intel" in the above sentence > > > > 2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot > > http://t.co/V0XLKrp3TI > > >> > > Opennlp returns "Intel Division Chief Lashes" > > > > Issue 1: I don't understand why it returns a composite string in the > > second case, instead of just Intel > > Issue 2: The "Intel" in the second sentence is not really "Intel" > > > > My code as follows, > > > > public static String findOrg(String message) throws Exception { > > String[] words = message.split(" "); > > InputStream orgIs = new > FileInputStream("en-ner-organization.bin"); > > TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs); > > NameFinderME nf = new NameFinderME(tnf); > > Span sp[] = nf.find(words); > > String a[] = Span.spansToStrings(sp, words); > > StringBuilder sb = new StringBuilder(); > > int l = a.length; > > > > for (int j = 0; j < l; j++) { > > sb = sb.append(a[j] + "\n"); > > } > > > > return sb.toString(); > > } > > > > Thanks, > > Ss > > > > >
