That is such a poor answer

Sent from my iPhone

On Sep 20, 2013, at 11:11 AM, Jeffrey Mershon <[email protected]> wrote:

> Siva,
> 
> I'm assuming there is nothing wrong with you code. OpenNLP's named-entity
> recognizer is based on MaxEnt modeling, as opposed to rule-based
> programming, to identify named entities. So, the answer to "Why did OpenNLP
> return X as an organization" is always going to be "Because it was trained
> to do so". If the training set--that is, the set of sentences used to train
> the recognition model that you are using--does not possess similar
> characteristics to the sentences you are using that model to process, you
> are going to get sub-optimal results.
> 
> It looks to me as if you are processing tweets. If you're using the default
> recognizer, I doubt very much whether that was trained on tweets, and
> tweets possess very different characteristics than regular prose.
> Consequently, I suggest that you consider training a model using data that
> represents what you want to actually process.
> 
> In the examples you give, Intel is a company name in on case and a slang
> term (contraction of Intelligence) in another.You may find that it is not
> possible to train just one model to handle all cases. You might need
> individual strategies for different industries, depending on what you are
> trying to achieve. Good Luck.
> 
> Regards,
> 
> Jeff
> 
> 
> On Fri, Sep 20, 2013 at 2:59 AM, Siva Sakthi <[email protected]> wrote:
> 
>> Can anyone answer the above question???
>> 
>> Thanks
>> 
>> 
>> On Fri, Sep 13, 2013 at 4:19 PM, Siva Sakthi <[email protected]> wrote:
>> 
>>> Hi,
>>>  we are using opennlp for finding organizations (code below)
>>> 
>>> e.g.
>>> 
>>> 1. Find out how Intel Xeon processors help make #EMC number 1 in backup
>> at
>>> #IDF13 going on now in San Francisco. #Speed2Lead Protect your data
>>>>> 
>>> Opennlp returns "Intel" in the above sentence
>>> 
>>> 2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot
>>> http://t.co/V0XLKrp3TI
>>>>> 
>>> Opennlp returns "Intel Division Chief Lashes"
>>> 
>>> Issue 1: I don't understand why it returns a composite string in the
>>> second case, instead of just Intel
>>> Issue 2: The "Intel" in the second sentence is not really "Intel"
>>> 
>>> My code as follows,
>>> 
>>>    public static String findOrg(String message) throws Exception {
>>>        String[] words = message.split(" ");
>>>        InputStream orgIs = new
>> FileInputStream("en-ner-organization.bin");
>>>        TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs);
>>>        NameFinderME nf = new NameFinderME(tnf);
>>>        Span sp[] = nf.find(words);
>>>        String a[] = Span.spansToStrings(sp, words);
>>>        StringBuilder sb = new StringBuilder();
>>>        int l = a.length;
>>> 
>>>        for (int j = 0; j < l; j++) {
>>>            sb = sb.append(a[j] + "\n");
>>>        }
>>> 
>>>        return sb.toString();
>>>    }
>>> 
>>> Thanks,
>>> Ss
>>> 
>>> 
>> 

Reply via email to