Hi all,

I finally got the chance to clean-up and essentially revisit some bits of code that have helped me a lot the past year. I put it all together in a project and open-sourced it just in case anyone else find it useful. The project is a high-performance, dictionary-based annotator which can be tuned for either openNLP or stanfordNLP or some custom NER engine. Features include:

 * openNLP or stanfordNLP or custom NER component compatibility
 * fully parallel annotations of separate documents (optional)
 * flexible API can deal with multiple dictionaries per document
   (merges them in a set)
 * custom tags are supported and can be provided directly on the
   command-line
 * basic normalisation is applied to the dictionary entries
   (un-capitalisation - unless they are all capital)
 * options to merge all the annotations together in a single file or
   write them separately on dedicated directory
 * fully functional command-line interface
 * fully usable from any JVM-based language
 * non-reflective source code
 * data-centric & immutable API

The project lives here: https://github.com/jimpil/annotator-clj

Feel free to try it out...:)

Jim


Reply via email to