You can use DictionaryNameFinders for the more empirical entities or those
that are gazateer based (or just use basic Java and use your own objects),
and TokenNameFinder with a Maxent model for those that are hard to specify
up-front, the latter is where OpenNLP has been essential to me. It sounds
like one of your toughest problems will be the normalization of dates and
amounts because of the multitude of ways people express dates and times in
free text. So to answer your question, engineering NLP solutions is always
difficult, but your problem is a fairly typical one IMO.


On Thu, Jan 16, 2014 at 6:04 AM, <[email protected]> wrote:

>
> Hello,
>
> I would like to develop an OpenNLP application which would index place and
> people names, dates, numbers and monetary amounts, among other things,
> contained in thousands of PDFs. People and place names would be looked up
> in gazeeters (ie, dictionaries) and dates, numbers and amounts would be
> normalized so as to be comparable (eg, find all PDFs whose contents contain
> dates > 20010101 and < 20100101).
>
> Furthermore, the generated index would have to be SOLR-compatible and
> tweakable (eg, one should be able to specify the criteria used to sort
> search results, eg, order documents by date, document name, people names,
> etc.)
>
> How difficult would it be to develop such an application in OpenNLP?
>
> Many thanks.
>
> Philippe
>
>

Reply via email to