That is such a poor answer
Sent from my iPhone On Sep 20, 2013, at 11:11 AM, Jeffrey Mershon <[email protected]> wrote: > Siva, > > I'm assuming there is nothing wrong with you code. OpenNLP's named-entity > recognizer is based on MaxEnt modeling, as opposed to rule-based > programming, to identify named entities. So, the answer to "Why did OpenNLP > return X as an organization" is always going to be "Because it was trained > to do so". If the training set--that is, the set of sentences used to train > the recognition model that you are using--does not possess similar > characteristics to the sentences you are using that model to process, you > are going to get sub-optimal results. > > It looks to me as if you are processing tweets. If you're using the default > recognizer, I doubt very much whether that was trained on tweets, and > tweets possess very different characteristics than regular prose. > Consequently, I suggest that you consider training a model using data that > represents what you want to actually process. > > In the examples you give, Intel is a company name in on case and a slang > term (contraction of Intelligence) in another.You may find that it is not > possible to train just one model to handle all cases. You might need > individual strategies for different industries, depending on what you are > trying to achieve. Good Luck. > > Regards, > > Jeff > > > On Fri, Sep 20, 2013 at 2:59 AM, Siva Sakthi <[email protected]> wrote: > >> Can anyone answer the above question??? >> >> Thanks >> >> >> On Fri, Sep 13, 2013 at 4:19 PM, Siva Sakthi <[email protected]> wrote: >> >>> Hi, >>> we are using opennlp for finding organizations (code below) >>> >>> e.g. >>> >>> 1. Find out how Intel Xeon processors help make #EMC number 1 in backup >> at >>> #IDF13 going on now in San Francisco. #Speed2Lead Protect your data >>>>> >>> Opennlp returns "Intel" in the above sentence >>> >>> 2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot >>> http://t.co/V0XLKrp3TI >>>>> >>> Opennlp returns "Intel Division Chief Lashes" >>> >>> Issue 1: I don't understand why it returns a composite string in the >>> second case, instead of just Intel >>> Issue 2: The "Intel" in the second sentence is not really "Intel" >>> >>> My code as follows, >>> >>> public static String findOrg(String message) throws Exception { >>> String[] words = message.split(" "); >>> InputStream orgIs = new >> FileInputStream("en-ner-organization.bin"); >>> TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs); >>> NameFinderME nf = new NameFinderME(tnf); >>> Span sp[] = nf.find(words); >>> String a[] = Span.spansToStrings(sp, words); >>> StringBuilder sb = new StringBuilder(); >>> int l = a.length; >>> >>> for (int j = 0; j < l; j++) { >>> sb = sb.append(a[j] + "\n"); >>> } >>> >>> return sb.toString(); >>> } >>> >>> Thanks, >>> Ss >>> >>> >>
