Thanks everyone for your timely responses.

Alexander,

No, the context is search queries; we'd like to recognize entities in them. The 
problem is that they really are fragment sentences compared to a real corpus -- 
entities we want to recognize make up easily 66% - 75% of an actual query. 
Perhaps NER isn't the right tool here?

Rodrigo,

I was checking out the RegexNameFinder this morning. How is that much different 
from just running a pile of regexs on text as-is? Looking at the code, maybe 
the answer is integration within the OpenNLP classes?

I'm not really sure what a gazette is used for. Could you explain more?



Patrick Baggett
Online Engineer - Search Team
e: [email protected]
p: +1 (214) 202-8964

-----Original Message-----
From: Alexander Wallin [mailto:[email protected]]
Sent: Wednesday, November 05, 2014 9:52 AM
To: [email protected]
Subject: Re: How to resolve "Model not compatible with name finder!"

I presume that context is the text category (i.e. Manuals, news texts and so 
forth); you can apply your model on any source text bur your recall and 
precision will suffer if your training corpora differs too much from the text.

The easiest method for generating a training corpus is to search the internet 
for sentences where they occur and manually mark your entities according to 
their type.
You can either do this yourself or use a service such as amazon's mechanical 
turk.

Given a sufficiently large training corpus for your domain you can then use 
word distance for entity disambiguation.

Sincerely

Alexander


> 5 nov 2014 kl. 16:26 skrev <[email protected]>:
>
> I've tried two formats so far:
>
> <START>The Home Depot<END>.
> <START>Black and Decker<END>.
> <START>Ryobi<END>.
>
> And:
>
> <START:company>The Home Depot<END>.
> <START:company>Black and Decker<END>.
> <START:company>Ryobi<END>.
>
> All that I'm trying to achieve is company names and other specific words 
> being marked as such. I'm beginning to think that maybe I should be aiming 
> towards regular expressions, but I want to capture alternate spellings like 
> "Black & Decker", which is why I thought NER might help. I don't really quite 
> understand how I am supposed to supply a "corpus" that includes these words 
> in some kind of context; mostly, I just need context-free matching...


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to