Siva,

I'm assuming there is nothing wrong with you code. OpenNLP's named-entity
recognizer is based on MaxEnt modeling, as opposed to rule-based
programming, to identify named entities. So, the answer to "Why did OpenNLP
return X as an organization" is always going to be "Because it was trained
to do so". If the training set--that is, the set of sentences used to train
the recognition model that you are using--does not possess similar
characteristics to the sentences you are using that model to process, you
are going to get sub-optimal results.

It looks to me as if you are processing tweets. If you're using the default
recognizer, I doubt very much whether that was trained on tweets, and
tweets possess very different characteristics than regular prose.
Consequently, I suggest that you consider training a model using data that
represents what you want to actually process.

In the examples you give, Intel is a company name in on case and a slang
term (contraction of Intelligence) in another.You may find that it is not
possible to train just one model to handle all cases. You might need
individual strategies for different industries, depending on what you are
trying to achieve. Good Luck.

Regards,

Jeff


On Fri, Sep 20, 2013 at 2:59 AM, Siva Sakthi <[email protected]> wrote:

> Can anyone answer the above question???
>
> Thanks
>
>
> On Fri, Sep 13, 2013 at 4:19 PM, Siva Sakthi <[email protected]> wrote:
>
> > Hi,
> >   we are using opennlp for finding organizations (code below)
> >
> > e.g.
> >
> > 1. Find out how Intel Xeon processors help make #EMC number 1 in backup
> at
> > #IDF13 going on now in San Francisco. #Speed2Lead Protect your data
> > >>
> > Opennlp returns "Intel" in the above sentence
> >
> > 2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot
> > http://t.co/V0XLKrp3TI
> > >>
> > Opennlp returns "Intel Division Chief Lashes"
> >
> > Issue 1: I don't understand why it returns a composite string in the
> > second case, instead of just Intel
> > Issue 2: The "Intel" in the second sentence is not really "Intel"
> >
> > My code as follows,
> >
> >     public static String findOrg(String message) throws Exception {
> >         String[] words = message.split(" ");
> >         InputStream orgIs = new
> FileInputStream("en-ner-organization.bin");
> >         TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs);
> >         NameFinderME nf = new NameFinderME(tnf);
> >         Span sp[] = nf.find(words);
> >         String a[] = Span.spansToStrings(sp, words);
> >         StringBuilder sb = new StringBuilder();
> >         int l = a.length;
> >
> >         for (int j = 0; j < l; j++) {
> >             sb = sb.append(a[j] + "\n");
> >         }
> >
> >         return sb.toString();
> >     }
> >
> > Thanks,
> > Ss
> >
> >
>

Reply via email to