Thanks everyone for your timely responses. Alexander,
No, the context is search queries; we'd like to recognize entities in them. The problem is that they really are fragment sentences compared to a real corpus -- entities we want to recognize make up easily 66% - 75% of an actual query. Perhaps NER isn't the right tool here? Rodrigo, I was checking out the RegexNameFinder this morning. How is that much different from just running a pile of regexs on text as-is? Looking at the code, maybe the answer is integration within the OpenNLP classes? I'm not really sure what a gazette is used for. Could you explain more? Patrick Baggett Online Engineer - Search Team e: [email protected] p: +1 (214) 202-8964 -----Original Message----- From: Alexander Wallin [mailto:[email protected]] Sent: Wednesday, November 05, 2014 9:52 AM To: [email protected] Subject: Re: How to resolve "Model not compatible with name finder!" I presume that context is the text category (i.e. Manuals, news texts and so forth); you can apply your model on any source text bur your recall and precision will suffer if your training corpora differs too much from the text. The easiest method for generating a training corpus is to search the internet for sentences where they occur and manually mark your entities according to their type. You can either do this yourself or use a service such as amazon's mechanical turk. Given a sufficiently large training corpus for your domain you can then use word distance for entity disambiguation. Sincerely Alexander > 5 nov 2014 kl. 16:26 skrev <[email protected]>: > > I've tried two formats so far: > > <START>The Home Depot<END>. > <START>Black and Decker<END>. > <START>Ryobi<END>. > > And: > > <START:company>The Home Depot<END>. > <START:company>Black and Decker<END>. > <START:company>Ryobi<END>. > > All that I'm trying to achieve is company names and other specific words > being marked as such. I'm beginning to think that maybe I should be aiming > towards regular expressions, but I want to capture alternate spellings like > "Black & Decker", which is why I thought NER might help. I don't really quite > understand how I am supposed to supply a "corpus" that includes these words > in some kind of context; mostly, I just need context-free matching... ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
