Hi Dan,

The dictionary element is to add to the name recognizer to help find names that don't match or to help enforce name recognition here. I'm not exactly sure if this is quite what you want to do.

There is a lesser used Dictionary name finder that may be more suited to what you are wanting to do... I think. But, the current version in 1.5.2 has a few bugs. You can get a pre-release here: http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/ of our next release to help with the problems.

The dictionary format is fairly straight forward .... though not well documented. There are also several CLI tools to convert files to a dictionary format.

I guess I'll try to better the documentation here.... :-)

<?xml version="1.0" encoding="UTF-8"?><dictionary case_sensitive="true">
<entry>
<token>Patrick</token>
</entry>
</dictionary>

The dictionary contains entries for the tokens for each. When the DictionaryNameFinder is called, it will attempt to find the longest matching series from the dictionary in the document. This sort of dictionary is best for keywords, some names and special words. You could use this type of dictionary populated with the keywords for c/c++ and it could parse and tag a program file with all the keywords.

Let me know if I'm headed down the wrong path here....

Thanks,
James

On 3/8/2013 11:56 PM, Daniel Franc wrote:
Hi James,

Thanks for your reply.  Maybe my questions are too elementary so sorry!

I was running through the OpenNLP manual and went through the "tokenizer" step (http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer).

Then when running through the "name finder" step it alluded to an alternative separate dictionary lookup step (end of this section: http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.recognition.api)

I was able to create a dictionary for lookup, but I can't figure out how to load it up or search with it.

My eventual goal is have a method to look up a set of terms within a document as an alternative way to classify or tag the document and not necessarily use the statistical name finder. I'm not familiar with JWNL but I could give that a try. It seems that I could manually code a text search through a document, but I thought I'd try to use OpenNLP first.

Thanks again -- Dan




On Fri, Mar 8, 2013 at 4:22 PM, James Kosin <[email protected] <mailto:[email protected]>> wrote:

    Dan,

    I'm guessing when you say tokenized you mean with POS values.  If
    so, a better approach would be to use the JWNL library to look up
    the dictionary terms.  We use this with our coref component and
    isn't hard to get working.  The biggest thing with POS is
    selecting the right one.  It may be better to build a model for
    the POS tokenizer than to build a dictionary for this.  Unless you
    are meaning for a different language.

    I guess I need more information from you on what you are trying to
    accomplish?

    James


    On 3/8/2013 6:05 PM, Daniel Franc wrote:

        Hello friends,

        I am at a novice level for both OpenNLP and Java and have been
        fumbling
        around to put together a working version of the software with
        some success
        thanks to the documentation provided!  My eventual goal is
        partially to
        look up terms within a pre-defined dictionary, and I've been
        able to use
        the dictionary creator to create a basic dictionary to lookup
        from as here:

             dictionary.serialize(new FileOutputStream(
        "/Applications/apache-opennlp-1.5.2-incubating/dictionarynames.txt"));
        My particular questions are:

        1. Can someone help me with loading this dictionary after it
        was previously
        created?

        2. Is there a straightforward was to implement a basic lookup
        mechanism for
        tokenized text?

        Thanks for your help!
        -Dan




Reply via email to